From: Bart Van Assche <Bart.VanAssche@sandisk.com>
To: "mauricfo@linux.vnet.ibm.com" <mauricfo@linux.vnet.ibm.com>,
"hare@suse.de" <hare@suse.de>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the target port unavailable state
Date: Thu, 13 Apr 2017 21:14:14 +0000 [thread overview]
Message-ID: <1492118053.24345.20.camel@sandisk.com> (raw)
In-Reply-To: <1491873481-23900-2-git-send-email-mauricfo@linux.vnet.ibm.com>
On Mon, 2017-04-10 at 22:17 -0300, Mauricio Faria de Oliveira wrote:
> According to SPC-4 (5.15.2.4.5 Unavailable state), the unavailable
> state may (or may not) transition to other states (e.g., microcode
> downloading or hardware error, which may be temporary or permanent
> conditions, respectively).
>
> But, scsi_dh_alua currently fails the I/O requests early once that
> state is established (in alua_prep_fn()), which provides no chance
> for path checkers going through that function path to really check
> whether the path actually still fails I/O requests or recovered to
> an active state.
>
> This might cause device-mapper multipath to fail all paths to some
> storage system that moves the controllers to the unavailable state
> for firmware upgrades, and never recover regardless of the storage
> system doing upgrades one controller at a time and get them online.
>
> Then I/O requests are blocked indefinitely due to queue_if_no_path
> but the underlying individual paths are fully operational, and can
> be verified as such through other function paths (e.g., SG_IO):
>
> # multipath -l
> mpatha (360050764008100dac000000000000100) dm-0 IBM,2145
> size=40G features='2 queue_if_no_path retain_attached_hw_handler'
> hwhandler='1 alua' wp=rw
> |-+- policy='service-time 0' prio=0 status=enabled
> | |- 1:0:1:0 sdf 8:80 failed undef running
> | `- 2:0:1:0 sdn 8:208 failed undef running
> `-+- policy='service-time 0' prio=0 status=enabled
> |- 1:0:0:0 sdb 8:16 failed undef running
> `- 2:0:0:0 sdj 8:144 failed undef running
>
> # strace -e read \
> sg_dd if=/dev/sdj of=/dev/null bs=512 count=1 iflag=direct \
> 2>&1 | grep 512
> read(3, 0x3fff7ba80000, 512) = -1 EIO (Input/output error)
>
> # strace -e ioctl \
> sg_dd if=/dev/sdj of=/dev/null bs=512 count=1 iflag=direct \
> blk_sgio=1 \
> 2>&1 | grep 512
> ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[10]=[28, 00, 00, 00,
> 00, 00, 00, 00, 01, 00], <...>) = 0
>
> So, allow I/O to target port (groups) in the unavailable state, so the
> path checkers can actually check them, and schedule a recheck whenever
> the unavailable state is detected so pg->state can be updated properly
> (and further SCSI IO error messages then silenced through alua_prep_fn()).
>
> Once a path checker eventually detects an active state again, the port
> group state will be updated by the path activation call, alua_activate(),
> as it schedules an alua_rtpg() check.
>
> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
> Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
> ---
> drivers/scsi/device_handler/scsi_dh_alua.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index c01b47e5b55a..5e5a33cac951 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -431,6 +431,20 @@ static int alua_check_sense(struct scsi_device *sdev,
> alua_check(sdev, false);
> return NEEDS_RETRY;
> }
> + if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0c) {
> + /*
> + * LUN Not Accessible - target port in unavailable state.
> + *
> + * It may (not) be possible to transition to other states;
> + * the transition might take a while or not happen at all,
> + * depending on the storage system model, error type, etc.
> + *
> + * Do not retry, so failover to another target port occur.
> + * Schedule a recheck to update state for other functions.
> + */
> + alua_check(sdev, true);
> + return SUCCESS;
> + }
> break;
> case UNIT_ATTENTION:
> if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
> @@ -1057,6 +1071,8 @@ static void alua_check(struct scsi_device *sdev, bool force)
> *
> * Fail I/O to all paths not in state
> * active/optimized or active/non-optimized.
> + * Allow I/O to all paths in state unavailable
> + * so path checkers can actually check them.
> */
> static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
> {
> @@ -1072,6 +1088,8 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
> rcu_read_unlock();
> if (state == SCSI_ACCESS_STATE_TRANSITIONING)
> ret = BLKPREP_DEFER;
> + else if (state == SCSI_ACCESS_STATE_UNAVAILABLE)
> + req->rq_flags |= RQF_QUIET;
> else if (state != SCSI_ACCESS_STATE_OPTIMAL &&
> state != SCSI_ACCESS_STATE_ACTIVE &&
> state != SCSI_ACCESS_STATE_LBA) {
Hello Mauricio,
Please also add support for the "standby" state to both alua_check_sense()
and alua_prep_fn() while you are modifying these functions.
Thanks,
Bart.
next prev parent reply other threads:[~2017-04-13 21:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-11 1:17 [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
2017-04-11 1:17 ` [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the " Mauricio Faria de Oliveira
2017-04-13 21:14 ` Bart Van Assche [this message]
2017-04-11 1:17 ` [PATCH 2/4] scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg() sdev_printk Mauricio Faria de Oliveira
2017-04-13 21:18 ` Bart Van Assche
2017-04-11 1:18 ` [PATCH 3/4] scsi: scsi_dh_alua: print changes to RTPG state of other PGs too Mauricio Faria de Oliveira
2017-04-13 21:35 ` Bart Van Assche
2017-04-11 1:18 ` [PATCH 4/4] scsi: scsi_dh_alua: do not print target port group state if it remains unavailable Mauricio Faria de Oliveira
2017-04-13 21:40 ` Bart Van Assche
2017-04-11 1:21 ` [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1492118053.24345.20.camel@sandisk.com \
--to=bart.vanassche@sandisk.com \
--cc=hare@suse.de \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mauricfo@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.