* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices [not found] ` <5424472E.4060300@linux.vnet.ibm.com> @ 2014-09-25 16:57 ` Christoph Hellwig 2014-09-30 18:05 ` wenxiong 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2014-09-25 16:57 UTC (permalink / raw) To: Brian King; +Cc: device-mapper development, linux-scsi, James Bottomley On Thu, Sep 25, 2014 at 11:47:42AM -0500, Brian King wrote: > The issue we've run into started when this patch started making its > way into distros: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e > > That changed the behaviour for user initiated TUR commands. After an ipr > adapter gets reset, all disk array devices require a start unit command > to be issued to them before they will accept commands. So, with the SCSI > EH change, we now end up in a scenario with dual ipr adapters where the > TUR getting issued from the health checker returns with a Not Ready response > and since SCSI EH no longer triggers the Start Unit in this scenario, > the path never recovers. > > The alternative solution would be to change the TUR path checker in multipath-tools > to issue a Start Unit if it sees a 02/04/02. Or we could fix up the check introduced by the commit, with something ala: diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index a2c3d3d..7228d9e 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -459,13 +459,18 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) if (! scsi_command_normalize_sense(scmd, &sshdr)) return FAILED; /* no valid sense data */ - if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) + if (scmd->cmnd[0] == TEST_UNIT_READY && + scmd->request->cmd_type == REQ_TYPE_FS && + scmd->scsi_done != scsi_eh_done) { /* * nasty: for mid-layer issued TURs, we need to return the * actual sense data without any recovery attempt. For eh - * issued ones, we need to try to recover and interpret + * issued ones, we need to try to recover and interpret, + * and for pass through TURs we just need to stay out of the + * way, so that the device handlers can do the right thing. */ return SUCCESS; + } scsi_report_sense(sdev, &sshdr); > > Thanks, > > Brian > > -- > Brian King > Power Linux I/O > IBM Linux Technology Center > > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ---end quoted text--- ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices 2014-09-25 16:57 ` [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices Christoph Hellwig @ 2014-09-30 18:05 ` wenxiong 2014-10-01 12:51 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: wenxiong @ 2014-09-30 18:05 UTC (permalink / raw) To: device-mapper development, Christoph Hellwig Cc: Brian King, James Bottomley, linux-scsi Quoting Christoph Hellwig <hch@infradead.org>: > On Thu, Sep 25, 2014 at 11:47:42AM -0500, Brian King wrote: >> The issue we've run into started when this patch started making its >> way into distros: >> >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e >> >> That changed the behaviour for user initiated TUR commands. After an ipr >> adapter gets reset, all disk array devices require a start unit command >> to be issued to them before they will accept commands. So, with the SCSI >> EH change, we now end up in a scenario with dual ipr adapters where the >> TUR getting issued from the health checker returns with a Not Ready response >> and since SCSI EH no longer triggers the Start Unit in this scenario, >> the path never recovers. >> >> The alternative solution would be to change the TUR path checker in >> multipath-tools >> to issue a Start Unit if it sees a 02/04/02. > > Or we could fix up the check introduced by the commit, with something > ala: > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index a2c3d3d..7228d9e 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -459,13 +459,18 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) > if (! scsi_command_normalize_sense(scmd, &sshdr)) > return FAILED; /* no valid sense data */ > > - if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) > + if (scmd->cmnd[0] == TEST_UNIT_READY && > + scmd->request->cmd_type == REQ_TYPE_FS && > + scmd->scsi_done != scsi_eh_done) { > /* > * nasty: for mid-layer issued TURs, we need to return the > * actual sense data without any recovery attempt. For eh > - * issued ones, we need to try to recover and interpret > + * issued ones, we need to try to recover and interpret, > + * and for pass through TURs we just need to stay out of the > + * way, so that the device handlers can do the right thing. > */ > return SUCCESS; > + } > > scsi_report_sense(sdev, &sshdr); > > Hi Christoph, We have verified above patch in our test group system yesterday and today. It works fine with their testcases. Thanks, Wendy >> >> Thanks, >> >> Brian >> >> -- >> Brian King >> Power Linux I/O >> IBM Linux Technology Center >> >> >> -- >> dm-devel mailing list >> dm-devel@redhat.com >> https://www.redhat.com/mailman/listinfo/dm-devel > ---end quoted text--- > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices 2014-09-30 18:05 ` wenxiong @ 2014-10-01 12:51 ` Christoph Hellwig 2014-10-06 15:22 ` Brian King 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2014-10-01 12:51 UTC (permalink / raw) To: wenxiong Cc: device-mapper development, Christoph Hellwig, Brian King, James Bottomley, linux-scsi Unfortunately the patch wasn't quite correct - all TEST_UNIT_READY commands are sent as BLOCK_PC, so this would basically revert James' original fix for the SATL case. Am I right to assume you only need the call to scsi_dh->check_sense and not the rest of the handling for the multipath path checker? If that's the case something like the patch below sould work: diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 5db8454..399c1c8 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) if (! scsi_command_normalize_sense(scmd, &sshdr)) return FAILED; /* no valid sense data */ - if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) - /* - * nasty: for mid-layer issued TURs, we need to return the - * actual sense data without any recovery attempt. For eh - * issued ones, we need to try to recover and interpret - */ - return SUCCESS; - scsi_report_sense(sdev, &sshdr); if (scsi_sense_is_deferred(&sshdr)) @@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) /* handler does not care. Drop down to default handling */ } + if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) + /* + * nasty: for mid-layer issued TURs, we need to return the + * actual sense data without any recovery attempt. For eh + * issued ones, we need to try to recover and interpret + */ + return SUCCESS; + /* * Previous logic looked for FILEMARK, EOM or ILI which are * mainly associated with tapes and returned SUCCESS. ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices 2014-10-01 12:51 ` Christoph Hellwig @ 2014-10-06 15:22 ` Brian King 2014-10-06 21:50 ` wenxiong 0 siblings, 1 reply; 6+ messages in thread From: Brian King @ 2014-10-06 15:22 UTC (permalink / raw) To: Christoph Hellwig, wenxiong Cc: device-mapper development, James Bottomley, linux-scsi On 10/01/2014 07:51 AM, Christoph Hellwig wrote: > Unfortunately the patch wasn't quite correct - all TEST_UNIT_READY > commands are sent as BLOCK_PC, so this would basically revert James' > original fix for the SATL case. > > Am I right to assume you only need the call to scsi_dh->check_sense and > not the rest of the handling for the multipath path checker? If that's > the case something like the patch below sould work: This would work if we also duplicated the 02/04/02 K/C/Q check in alua_check_sense handler. Wendy - can you try my patch below, along with Christoph's latest patch here and see if that resolves the issue? Thanks, Brian > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 5db8454..399c1c8 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) > if (! scsi_command_normalize_sense(scmd, &sshdr)) > return FAILED; /* no valid sense data */ > > - if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) > - /* > - * nasty: for mid-layer issued TURs, we need to return the > - * actual sense data without any recovery attempt. For eh > - * issued ones, we need to try to recover and interpret > - */ > - return SUCCESS; > - > scsi_report_sense(sdev, &sshdr); > > if (scsi_sense_is_deferred(&sshdr)) > @@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) > /* handler does not care. Drop down to default handling */ > } > > + if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) > + /* > + * nasty: for mid-layer issued TURs, we need to return the > + * actual sense data without any recovery attempt. For eh > + * issued ones, we need to try to recover and interpret > + */ > + return SUCCESS; > + > /* > * Previous logic looked for FILEMARK, EOM or ILI which are > * mainly associated with tapes and returned SUCCESS. > Signed-off-by: Brian King <brking@linux.vnet.ibm.com> --- drivers/scsi/device_handler/scsi_dh_alua.c | 7 +++++++ 1 file changed, 7 insertions(+) diff -puN drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart drivers/scsi/device_handler/scsi_dh_alua.c --- linux/drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart 2014-10-06 10:19:16.184798305 -0500 +++ linux-bjking1/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-06 10:20:35.743165951 -0500 @@ -474,6 +474,13 @@ static int alua_check_sense(struct scsi_ * LUN Not Ready -- Offline */ return SUCCESS; + if (sdev->allow_restart && + (sense_hdr->asc == 0x04) && (sense_hdr->ascq == 0x02)) + /* + * if the device is not started, we need to wake + * the error handler to start the motor + */ + return FAILED; break; case UNIT_ATTENTION: if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) _ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices 2014-10-06 15:22 ` Brian King @ 2014-10-06 21:50 ` wenxiong 2014-10-21 11:03 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: wenxiong @ 2014-10-06 21:50 UTC (permalink / raw) To: device-mapper development, Brian King Cc: Christoph Hellwig, linux-scsi, James Bottomley Quoting Brian King <brking@linux.vnet.ibm.com>: > On 10/01/2014 07:51 AM, Christoph Hellwig wrote: >> Unfortunately the patch wasn't quite correct - all TEST_UNIT_READY >> commands are sent as BLOCK_PC, so this would basically revert James' >> original fix for the SATL case. >> >> Am I right to assume you only need the call to scsi_dh->check_sense and >> not the rest of the handling for the multipath path checker? If that's >> the case something like the patch below sould work: > > This would work if we also duplicated the 02/04/02 K/C/Q check in > alua_check_sense > handler. > > Wendy - can you try my patch below, along with Christoph's latest patch here > and see if that resolves the issue? > > Thanks, > > Brian > >> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >> index 5db8454..399c1c8 100644 >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) >> if (! scsi_command_normalize_sense(scmd, &sshdr)) >> return FAILED; /* no valid sense data */ >> >> - if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) >> - /* >> - * nasty: for mid-layer issued TURs, we need to return the >> - * actual sense data without any recovery attempt. For eh >> - * issued ones, we need to try to recover and interpret >> - */ >> - return SUCCESS; >> - >> scsi_report_sense(sdev, &sshdr); >> >> if (scsi_sense_is_deferred(&sshdr)) >> @@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) >> /* handler does not care. Drop down to default handling */ >> } >> >> + if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) >> + /* >> + * nasty: for mid-layer issued TURs, we need to return the >> + * actual sense data without any recovery attempt. For eh >> + * issued ones, we need to try to recover and interpret >> + */ >> + return SUCCESS; >> + >> /* >> * Previous logic looked for FILEMARK, EOM or ILI which are >> * mainly associated with tapes and returned SUCCESS. >> > > > > Signed-off-by: Brian King <brking@linux.vnet.ibm.com> > --- > > drivers/scsi/device_handler/scsi_dh_alua.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff -puN > drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart > drivers/scsi/device_handler/scsi_dh_alua.c > --- > linux/drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart 2014-10-06 10:19:16.184798305 > -0500 > +++ > linux-bjking1/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-06 > 10:20:35.743165951 -0500 > @@ -474,6 +474,13 @@ static int alua_check_sense(struct scsi_ > * LUN Not Ready -- Offline > */ > return SUCCESS; > + if (sdev->allow_restart && > + (sense_hdr->asc == 0x04) && (sense_hdr->ascq == 0x02)) > + /* > + * if the device is not started, we need to wake > + * the error handler to start the motor > + */ > + return FAILED; > break; > case UNIT_ATTENTION: > if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) > _ > > -- Sorry it took some time since we need to re-config the systems for this test. With Christoph's new patch only, still saw the failure. With Christoph's new patch + Brian's patch, works fine, didn't see the failure. Thanks, Wendy > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices 2014-10-06 21:50 ` wenxiong @ 2014-10-21 11:03 ` Christoph Hellwig 0 siblings, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2014-10-21 11:03 UTC (permalink / raw) To: wenxiong Cc: device-mapper development, Brian King, Christoph Hellwig, linux-scsi, James Bottomley On Mon, Oct 06, 2014 at 05:50:32PM -0400, wenxiong@linux.vnet.ibm.com wrote: > Sorry it took some time since we need to re-config the systems for this test. > > With Christoph's new patch only, still saw the failure. > With Christoph's new patch + Brian's patch, works fine, didn't see the > failure. Can one of you send me a tested series with both patches? Thanks! ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-10-21 11:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20140924195721.148637349@linux.vnet.ibm.com>
[not found] ` <20140924195752.289011902@linux.vnet.ibm.com>
[not found] ` <5423B664.2020007@suse.de>
[not found] ` <5424472E.4060300@linux.vnet.ibm.com>
2014-09-25 16:57 ` [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices Christoph Hellwig
2014-09-30 18:05 ` wenxiong
2014-10-01 12:51 ` Christoph Hellwig
2014-10-06 15:22 ` Brian King
2014-10-06 21:50 ` wenxiong
2014-10-21 11:03 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox