* [PATCH 0/2] TUR path is down after adapter gets reset in multipath configuration @ 2014-10-27 18:01 wenxiong 2014-10-27 18:01 ` [PATCH 1/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_error.c) wenxiong 2014-10-27 18:01 ` [PATCH 2/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_dh_alus.c) wenxiong 0 siblings, 2 replies; 7+ messages in thread From: wenxiong @ 2014-10-27 18:01 UTC (permalink / raw) To: James.Bottomley; +Cc: hch, linux-scsi, brking Based on Christoph's and Brian's patches, these patch fix the TUR path is down after adapter gets reset issue. Thanks, Wendy -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_error.c) 2014-10-27 18:01 [PATCH 0/2] TUR path is down after adapter gets reset in multipath configuration wenxiong @ 2014-10-27 18:01 ` wenxiong 2014-10-27 19:07 ` Elliott, Robert (Server Storage) 2014-10-28 9:04 ` Christoph Hellwig 2014-10-27 18:01 ` [PATCH 2/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_dh_alus.c) wenxiong 1 sibling, 2 replies; 7+ messages in thread From: wenxiong @ 2014-10-27 18:01 UTC (permalink / raw) To: James.Bottomley; +Cc: hch, linux-scsi, brking [-- Attachment #1: allow_restart1 --] [-- Type: text/plain, Size: 1928 bytes --] After an ipr adapter gets reset, all disk array devices require a start unit command to be issued to them before they will accept commands. So, with the SCSI EH change, we now end up in a scenario with dual ipr adapters where the TUR getting issued from the health checker returns with a Not Ready response and since SCSI EH no longer triggers the Start Unit in this scenario, the path never recovers. Signed-off-by: Christoph Hellwig <hch@infradead.org> Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> --- drivers/scsi/scsi_error.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) Index: b/drivers/scsi/scsi_error.c =================================================================== --- a/drivers/scsi/scsi_error.c 2014-10-23 12:54:16.000000000 -0500 +++ b/drivers/scsi/scsi_error.c 2014-10-23 12:57:44.642078988 -0500 @@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_ if (! scsi_command_normalize_sense(scmd, &sshdr)) return FAILED; /* no valid sense data */ - if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) - /* - * nasty: for mid-layer issued TURs, we need to return the - * actual sense data without any recovery attempt. For eh - * issued ones, we need to try to recover and interpret - */ - return SUCCESS; - scsi_report_sense(sdev, &sshdr); if (scsi_sense_is_deferred(&sshdr)) @@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_ /* handler does not care. Drop down to default handling */ } + if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done) + /* + * nasty: for mid-layer issued TURs, we need to return the + * actual sense data without any recovery attempt. For eh + * issued ones, we need to try to recover and interpret + */ + return SUCCESS; + /* * Previous logic looked for FILEMARK, EOM or ILI which are * mainly associated with tapes and returned SUCCESS. -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH 1/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_error.c) 2014-10-27 18:01 ` [PATCH 1/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_error.c) wenxiong @ 2014-10-27 19:07 ` Elliott, Robert (Server Storage) 2014-10-28 9:04 ` Christoph Hellwig 1 sibling, 0 replies; 7+ messages in thread From: Elliott, Robert (Server Storage) @ 2014-10-27 19:07 UTC (permalink / raw) To: wenxiong@linux.vnet.ibm.com, James.Bottomley@HansenPartnership.com Cc: hch@infradead.org, linux-scsi@vger.kernel.org, brking@linux.vnet.ibm.com > -----Original Message----- > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi- > owner@vger.kernel.org] On Behalf Of wenxiong@linux.vnet.ibm.com > Sent: Monday, 27 October, 2014 1:02 PM > To: James.Bottomley@HansenPartnership.com > Cc: hch@infradead.org; linux-scsi@vger.kernel.org; > brking@linux.vnet.ibm.com > Subject: [PATCH 1/2] scsi: TUR path is down after adapter gets reset > in multipath configuration(scsi_error.c) > > After an ipr adapter gets reset, all disk array devices require a > start unit command to be issued to them before they will accept > commands. So, with the SCSI EH change, we now end up in a scenario > with dual ipr adapters where the TUR getting issued from the health > checker returns with a Not Ready response and since SCSI EH no > longer triggers the Start Unit in this scenario, the path never > recovers. > This description doesn't make sense without reading the thread: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices Example: what is "the SCSI EH change"? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_error.c) 2014-10-27 18:01 ` [PATCH 1/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_error.c) wenxiong 2014-10-27 19:07 ` Elliott, Robert (Server Storage) @ 2014-10-28 9:04 ` Christoph Hellwig 1 sibling, 0 replies; 7+ messages in thread From: Christoph Hellwig @ 2014-10-28 9:04 UTC (permalink / raw) To: wenxiong; +Cc: James.Bottomley, hch, linux-scsi, brking On Mon, Oct 27, 2014 at 01:01:48PM -0500, wenxiong@linux.vnet.ibm.com wrote: > After an ipr adapter gets reset, all disk array devices require a start > unit command to be issued to them before they will accept commands. So, > with the SCSI EH change, we now end up in a scenario with dual ipr > adapters where the TUR getting issued from the health checker returns > with a Not Ready response and since SCSI EH no longer triggers the Start > Unit in this scenario, the path never recovers. > > Signed-off-by: Christoph Hellwig <hch@infradead.org> > Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Th patch description is entirely wrong, but I guess I'll have to take the blame for that as it's my patch. How about something like: From: Christoph Hellwig <hch@infradead.org> Subject: scsi: call device handler for failed TUR command Multipath devices using the TUR path checker need to see the sense code for a failed TUR command in their device handler. Since commit <insert commit id here> we always return success for mid layer issued TUR commands before calling the device handler, which stopped the TUR path checker from working. Move the call to the device handler check sense method before the early return for TUR commands to give the device handler a chance to intercept them. Signed-off-by: Christoph Hellwig <hch@infradead.org> Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_dh_alus.c) 2014-10-27 18:01 [PATCH 0/2] TUR path is down after adapter gets reset in multipath configuration wenxiong 2014-10-27 18:01 ` [PATCH 1/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_error.c) wenxiong @ 2014-10-27 18:01 ` wenxiong 2014-10-27 19:56 ` Elliott, Robert (Server Storage) 1 sibling, 1 reply; 7+ messages in thread From: wenxiong @ 2014-10-27 18:01 UTC (permalink / raw) To: James.Bottomley; +Cc: hch, linux-scsi, brking [-- Attachment #1: allow_restart2 --] [-- Type: text/plain, Size: 1009 bytes --] This patch also fixes the 02/04/02 K/C/Q check in alua_check_sense handler. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Teste-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> --- drivers/scsi/device_handler/scsi_dh_alua.c | 7 +++++++ 1 file changed, 7 insertions(+) Index: b/drivers/scsi/device_handler/scsi_dh_alua.c =================================================================== --- a/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-23 13:00:45.000000000 -0500 +++ b/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-23 13:04:16.152079004 -0500 @@ -474,6 +474,13 @@ static int alua_check_sense(struct scsi_ * LUN Not Ready -- Offline */ return SUCCESS; + if (sdev->allow_restart && + (sense_hdr->asc == 0x04) && (sense_hdr->ascq == 0x02)) + /* + * if the device is not started, we need to wake + * the error handler to start the motor + */ + return FAILED; break; case UNIT_ATTENTION: if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH 2/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_dh_alus.c) 2014-10-27 18:01 ` [PATCH 2/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_dh_alus.c) wenxiong @ 2014-10-27 19:56 ` Elliott, Robert (Server Storage) 2014-10-27 22:48 ` Brian King 0 siblings, 1 reply; 7+ messages in thread From: Elliott, Robert (Server Storage) @ 2014-10-27 19:56 UTC (permalink / raw) To: wenxiong@linux.vnet.ibm.com, James.Bottomley@HansenPartnership.com Cc: hch@infradead.org, linux-scsi@vger.kernel.org, brking@linux.vnet.ibm.com > -----Original Message----- > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi- > owner@vger.kernel.org] On Behalf Of wenxiong@linux.vnet.ibm.com > Sent: Monday, 27 October, 2014 1:02 PM > To: James.Bottomley@HansenPartnership.com > Cc: hch@infradead.org; linux-scsi@vger.kernel.org; > brking@linux.vnet.ibm.com > Subject: [PATCH 2/2] scsi: TUR path is down after adapter gets reset > in multipath configuration(scsi_dh_alus.c) > > This patch also fixes the 02/04/02 K/C/Q check in alua_check_sense > handler. > > Signed-off-by: Brian King <brking@linux.vnet.ibm.com> > Teste-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Missing a d > --- > drivers/scsi/device_handler/scsi_dh_alua.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > Index: b/drivers/scsi/device_handler/scsi_dh_alua.c > =================================================================== > --- a/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-23 > 13:00:45.000000000 -0500 > +++ b/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-23 > 13:04:16.152079004 -0500 > @@ -474,6 +474,13 @@ static int alua_check_sense(struct scsi_ > * LUN Not Ready -- Offline > */ > return SUCCESS; > + if (sdev->allow_restart && > + (sense_hdr->asc == 0x04) && (sense_hdr->ascq == > 0x02)) The coding style in that function does not include the extra parenthesis, as shown by the next excerpt: > + /* > + * if the device is not started, we need to wake > + * the error handler to start the motor > + */ > + return FAILED; > break; > case UNIT_ATTENTION: > if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) Thus function is used several places: * installed as the scsi_device_handler .check_sense function * called to parse the response to REPORT TARGET PORT GROUPS in alua_rtpg * called to parse the response to SET TARGET PORT GROUPS in stpg_endio I'm not sure that adding NOT READY/LOGICAL UNIT NOT READY, INITIALIZING COMMAND REQUIRED (2h/04h/02h) is a good idea for the second case. The expected way to handle that response is to send START STOP UNIT with START=1. There are conditions in which REPORT TARGET PORT GROUPS is allowed and START STOP UNIT with START=1 is not allowed: * CbCS (capabilities-based command security) only allows START STOP UNIT if physical access (PHY ACC) is enabled, while REPORT TARGET PORT GROUPS is always allowed. * the standby or unavailable asymmetric access states only guarantee that REPORT TARGET PORT GROUPS is allowed, not START STOP UNIT. The device is permitted to support START STOP UNIT, but it's not required. So, it's really not a response that should be returned for that command. Any device that does return that response must also support START STOP UNIT or it's misleading the application client. In that case, falling through to the EH to send START STOP UNIT is the right thing to do. SET TARGET PORT GROUPS is questionable too; it also has different CbCS permissions and asymmetric access state requirements than START STOP UNIT with START=1. Perhaps that new "return FAILED" should be skipped if the opcode is REPORT TARGET PORT GROUPS or SET TARGET PORT GROUPS? --- Rob Elliott HP Server Storage ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_dh_alus.c) 2014-10-27 19:56 ` Elliott, Robert (Server Storage) @ 2014-10-27 22:48 ` Brian King 0 siblings, 0 replies; 7+ messages in thread From: Brian King @ 2014-10-27 22:48 UTC (permalink / raw) To: Elliott, Robert (Server Storage), wenxiong@linux.vnet.ibm.com, James.Bottomley@HansenPartnership.com Cc: hch@infradead.org, linux-scsi@vger.kernel.org On 10/27/2014 02:56 PM, Elliott, Robert (Server Storage) wrote: >> -----Original Message----- >> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi- >> owner@vger.kernel.org] On Behalf Of wenxiong@linux.vnet.ibm.com >> Sent: Monday, 27 October, 2014 1:02 PM >> To: James.Bottomley@HansenPartnership.com >> Cc: hch@infradead.org; linux-scsi@vger.kernel.org; >> brking@linux.vnet.ibm.com >> Subject: [PATCH 2/2] scsi: TUR path is down after adapter gets reset >> in multipath configuration(scsi_dh_alus.c) >> >> This patch also fixes the 02/04/02 K/C/Q check in alua_check_sense >> handler. >> >> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> >> Teste-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> > > Missing a d > >> --- >> drivers/scsi/device_handler/scsi_dh_alua.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> Index: b/drivers/scsi/device_handler/scsi_dh_alua.c >> =================================================================== >> --- a/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-23 >> 13:00:45.000000000 -0500 >> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-23 >> 13:04:16.152079004 -0500 >> @@ -474,6 +474,13 @@ static int alua_check_sense(struct scsi_ >> * LUN Not Ready -- Offline >> */ >> return SUCCESS; >> + if (sdev->allow_restart && >> + (sense_hdr->asc == 0x04) && (sense_hdr->ascq == >> 0x02)) > > The coding style in that function does not include the extra > parenthesis, as shown by the next excerpt: I just lifted this bit from scsi_error.c without noticing the coding style difference. We can certainly change this. > >> + /* >> + * if the device is not started, we need to wake >> + * the error handler to start the motor >> + */ >> + return FAILED; >> break; >> case UNIT_ATTENTION: >> if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) > > Thus function is used several places: > * installed as the scsi_device_handler .check_sense function > * called to parse the response to REPORT TARGET PORT GROUPS > in alua_rtpg > * called to parse the response to SET TARGET PORT GROUPS > in stpg_endio > > I'm not sure that adding NOT READY/LOGICAL UNIT NOT READY, > INITIALIZING COMMAND REQUIRED (2h/04h/02h) is a good idea > for the second case. The expected way to handle that > response is to send START STOP UNIT with START=1. > > There are conditions in which REPORT TARGET PORT GROUPS > is allowed and START STOP UNIT with START=1 is not allowed: > * CbCS (capabilities-based command security) only allows > START STOP UNIT if physical access (PHY ACC) is enabled, > while REPORT TARGET PORT GROUPS is always allowed. > * the standby or unavailable asymmetric access states only > guarantee that REPORT TARGET PORT GROUPS is allowed, not > START STOP UNIT. The device is permitted to support START > STOP UNIT, but it's not required. > > So, it's really not a response that should be returned for > that command. > > Any device that does return that response must also support > START STOP UNIT or it's misleading the application client. > In that case, falling through to the EH to send START STOP > UNIT is the right thing to do. > > SET TARGET PORT GROUPS is questionable too; it also has > different CbCS permissions and asymmetric access state > requirements than START STOP UNIT with START=1. > > Perhaps that new "return FAILED" should be skipped if the > opcode is REPORT TARGET PORT GROUPS or SET TARGET PORT > GROUPS? I'm fine with that. Wendy - do you want to make these changes and resend? Thanks, Brian -- Brian King Power Linux I/O IBM Linux Technology Center ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-10-28 9:04 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-27 18:01 [PATCH 0/2] TUR path is down after adapter gets reset in multipath configuration wenxiong 2014-10-27 18:01 ` [PATCH 1/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_error.c) wenxiong 2014-10-27 19:07 ` Elliott, Robert (Server Storage) 2014-10-28 9:04 ` Christoph Hellwig 2014-10-27 18:01 ` [PATCH 2/2] scsi: TUR path is down after adapter gets reset in multipath configuration(scsi_dh_alus.c) wenxiong 2014-10-27 19:56 ` Elliott, Robert (Server Storage) 2014-10-27 22:48 ` Brian King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox