From: Hannes Reinecke <hare@suse.de>
To: Mike Christie <michaelc@cs.wisc.edu>
Cc: Mike Snitzer <snitzer@redhat.com>,
"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
James Bottomley <James.Bottomley@suse.de>,
linux-scsi@vger.kernel.org
Subject: Re: scsi_dh_alua: add missing transitioning state support
Date: Wed, 22 Sep 2010 12:13:07 +0200 [thread overview]
Message-ID: <4C99D6B3.3090003@suse.de> (raw)
In-Reply-To: <4C992026.5080402@cs.wisc.edu>
[-- Attachment #1: Type: text/plain, Size: 4863 bytes --]
Mike Christie wrote:
> On 09/21/2010 02:33 PM, Mike Snitzer wrote:
>> On Mon, Sep 20 2010 at 10:27pm -0400,
>> Mike Christie<michaelc@cs.wisc.edu> wrote:
>>
>>> On 09/20/2010 10:35 AM, Mike Snitzer wrote:
>>>> Hi Hannes,
>>>>
>>>> On Tue, Aug 31 2010 at 11:11am -0400,
>>>> Mike Snitzer<snitzer@redhat.com> wrote:
>>>>
>>>>> On Mon, Aug 30 2010 at 5:36am -0400,
>>>>> Hannes Reinecke<hare@suse.de> wrote:
>>>>>
>>>>>> Nicholas A. Bellinger wrote:
>>>>>>> On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
>>>>>>>> Handle transitioning in the prep_fn.
>>>>>>>> Handle transitioning in alua_rtpg's implicit alua code too.
>>>>>>>>
>>>>>>>> These gaps were identified during controller failover testing of an
>>>>>>>> ALUA array.
>>>>>>>>
>>>>>>>> Signed-off-by: Mike Snitzer<snitzer@redhat.com>
>>>>>>>> ---
>>>>>>>> drivers/scsi/device_handler/scsi_dh_alua.c | 10 +++++++---
>>>>>>>> 1 files changed, 7 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>>> b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>>> index 1a970a7..c1eedc5 100644
>>>>>>>> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>>> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>>> @@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device
>>>>>>>> *sdev, struct alua_dh_data *h)
>>>>>>>> h->state == TPGS_STATE_STANDBY)
>>>>>>>> /* Useable path if active */
>>>>>>>> err = SCSI_DH_OK;
>>>>>>>> + else if (h->state == TPGS_STATE_TRANSITIONING)
>>>>>>>> + /* State transition, retry */
>>>>>>>> + goto retry;
>>>>>>>> else
>>>>>>>> /* Path unuseable for unavailable/offline */
>>>>>>>> err = SCSI_DH_DEV_OFFLINED;
>>>>>>>> @@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device
>>>>>>>> *sdev, struct request *req)
>>>>>>>> struct alua_dh_data *h = get_alua_data(sdev);
>>>>>>>> int ret = BLKPREP_OK;
>>>>>>>>
>>>>>>>> - if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>>>>> - h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>>>>> + if (h->state == TPGS_STATE_TRANSITIONING)
>>>>>>>> + ret = BLKPREP_DEFER;
>>>>>>>> + else if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>>>>> + h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>>>>> ret = BLKPREP_KILL;
>>>>>>>> req->cmd_flags |= REQ_QUIET;
>>>>>>>> }
>>>>>>>> return ret;
>>>>>>>> -
>>>>>>>> }
>>>>>>>>
>>>>>>>
>>>>>>> Makes sense to me..
>>>>>>>
>>>>>>> Acked-by: Nicholas A. Bellinger<nab@linux-iscsi.org>
>>>>>>>
>>>>>> Not so fast. There are two problems with this approach:
>>>>>>
>>>>>> The path is retried indefinitely. Arrays are _supposed_ to be in
>>>>>> 'transitioning'
>>>>>> only temporary; however, if the array is stuck due to a fw error
>>>>>> we're stuck in 'defer',
>>>>>> too.
>>>>>
>>>>> And what is the problem with that? The IO will eventually time out.
>>>>
>>>> To restate as a question: even though we'll retry in alua_rtpg();
>>>> shouldn't the SCSI command eventually time out (via
>>>> scsi_attempt_requeue_command)?
>>>
>>> That function is only in RHEL. Requests that are prepd and sent to
>>> the scsi layer and driver would eventually timeout in
>>> scsi_softirq_done in upstream.
>>>
>>> alua_prep_fn prevents the IO from getting sent to the scsi layer so
>>> we do not hit the check in scsi_softirq_done though.
>>
>> That is only the case if alua_prep_fn were to return BLKPREP_DEFER
>> right?
>
> Yeah. I misread the email above. Just to clarify..
>
> For the alua case, we will always retry due the prep_fn issue I mentioned.
>
> For the alua_rtpg() retry case you were discussing we will also always
> retry, because the timer check in
> scsi_attempt_requeue_command/scsi_softirq_done will break us from
> retrying forever in that execution of the request started by the
> submit_rtpg call. However, the added goto retry will would just end up
> starting a another execution. To handle Hannes comment I think you would
> want to add some retry/timer checks in alua_rtpg to prevent that from
> retrying forever.
>
>>
>> 2) the patch also modified alua_rtpg() so implicit ALUA would retry
>> (just like explicit ALUA currently does) if TPGS_STATE_TRANSITIONING
>> - so why should we avoid retry for implicit but do it for explicit?
>
> Leaving that for Hannes. I cannot think of a reason. Probably just did
> not do it.
Finally I got around to answering this.
I've attached a patch which I made the other day which seems to work
reasonably well.
Looks better from my side, so if you agree I'll be sending it
upstream properly.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
[-- Attachment #2: scsi-dh-alua-handle-all-states --]
[-- Type: text/plain, Size: 4696 bytes --]
>From d3f02c90db3e3177309b78726d082e17dd772ee2 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Wed, 22 Sep 2010 12:09:07 +0200
Subject: [PATCH] scsi_dh_alua: Handle all states correctly
For ALUA we should be handling all states, independent of whether
is explicit or implicit. For 'Transitioning' we should be retry
for a certain amount of time; after that an error should be
returned.
Signed-off-by: Hannes Reinecke <hare@suse.de>
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 1a970a7..c6f57e3 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -31,6 +31,7 @@
#define TPGS_STATE_NONOPTIMIZED 0x1
#define TPGS_STATE_STANDBY 0x2
#define TPGS_STATE_UNAVAILABLE 0x3
+#define TPGS_STATE_LBA_DEPENDENT 0x4
#define TPGS_STATE_OFFLINE 0xe
#define TPGS_STATE_TRANSITIONING 0xf
@@ -39,6 +40,7 @@
#define TPGS_SUPPORT_NONOPTIMIZED 0x02
#define TPGS_SUPPORT_STANDBY 0x04
#define TPGS_SUPPORT_UNAVAILABLE 0x08
+#define TPGS_SUPPORT_LBA_DEPENDENT 0x10
#define TPGS_SUPPORT_OFFLINE 0x40
#define TPGS_SUPPORT_TRANSITION 0x80
@@ -460,6 +462,8 @@ static char print_alua_state(int state)
return 'S';
case TPGS_STATE_UNAVAILABLE:
return 'U';
+ case TPGS_STATE_LBA_DEPENDENT:
+ return 'L';
case TPGS_STATE_OFFLINE:
return 'O';
case TPGS_STATE_TRANSITIONING:
@@ -542,7 +546,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
int len, k, off, valid_states = 0;
char *ucp;
unsigned err;
+ unsigned long expiry, interval = 10;
+ expiry = round_jiffies_up(jiffies + ALUA_FAILOVER_TIMEOUT);
retry:
err = submit_rtpg(sdev, h);
@@ -553,7 +559,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
return SCSI_DH_IO;
err = alua_check_sense(sdev, &sense_hdr);
- if (err == ADD_TO_MLQUEUE)
+ if (err == ADD_TO_MLQUEUE && time_before(jiffies, expiry))
goto retry;
sdev_printk(KERN_INFO, sdev,
"%s: rtpg sense code %02x/%02x/%02x\n",
@@ -587,38 +593,36 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
}
sdev_printk(KERN_INFO, sdev,
- "%s: port group %02x state %c supports %c%c%c%c%c%c\n",
+ "%s: port group %02x state %c supports %c%c%c%c%c%c%c\n",
ALUA_DH_NAME, h->group_id, print_alua_state(h->state),
valid_states&TPGS_SUPPORT_TRANSITION?'T':'t',
valid_states&TPGS_SUPPORT_OFFLINE?'O':'o',
+ valid_states&TPGS_SUPPORT_LBA_DEPENDENT?'L':'l',
valid_states&TPGS_SUPPORT_UNAVAILABLE?'U':'u',
valid_states&TPGS_SUPPORT_STANDBY?'S':'s',
valid_states&TPGS_SUPPORT_NONOPTIMIZED?'N':'n',
valid_states&TPGS_SUPPORT_OPTIMIZED?'A':'a');
- if (h->tpgs & TPGS_MODE_EXPLICIT) {
- switch (h->state) {
- case TPGS_STATE_TRANSITIONING:
+ switch (h->state) {
+ case TPGS_STATE_TRANSITIONING:
+ if (time_before(jiffies, expiry)) {
/* State transition, retry */
+ interval *= 10;
+ msleep(interval);
goto retry;
- break;
- case TPGS_STATE_OFFLINE:
- /* Path is offline, fail */
- err = SCSI_DH_DEV_OFFLINED;
- break;
- default:
- break;
}
- } else {
- /* Only Implicit ALUA support */
- if (h->state == TPGS_STATE_OPTIMIZED ||
- h->state == TPGS_STATE_NONOPTIMIZED ||
- h->state == TPGS_STATE_STANDBY)
- /* Useable path if active */
- err = SCSI_DH_OK;
- else
- /* Path unuseable for unavailable/offline */
- err = SCSI_DH_DEV_OFFLINED;
+ /* Transitioning time exceeded */
+ err = SCSI_DH_RETRY;
+ break;
+ case TPGS_STATE_OFFLINE:
+ case TPGS_STATE_UNAVAILABLE:
+ /* Path unuseable for unavailable/offline */
+ err = SCSI_DH_DEV_OFFLINED;
+ break;
+ default:
+ /* Useable path if active */
+ err = SCSI_DH_OK;
+ break;
}
return err;
}
@@ -672,7 +676,9 @@ static int alua_activate(struct scsi_device *sdev,
goto out;
}
- if (h->tpgs & TPGS_MODE_EXPLICIT && h->state != TPGS_STATE_OPTIMIZED) {
+ if (h->tpgs & TPGS_MODE_EXPLICIT &&
+ h->state != TPGS_STATE_OPTIMIZED &&
+ h->state != TPGS_STATE_LBA_DEPENDENT) {
h->callback_fn = fn;
h->callback_data = data;
err = submit_stpg(h);
@@ -698,8 +704,11 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
struct alua_dh_data *h = get_alua_data(sdev);
int ret = BLKPREP_OK;
- if (h->state != TPGS_STATE_OPTIMIZED &&
- h->state != TPGS_STATE_NONOPTIMIZED) {
+ if (h->state == TPGS_STATE_TRANSITIONING)
+ ret = BLKPREP_DEFER;
+ else if (h->state != TPGS_STATE_OPTIMIZED &&
+ h->state != TPGS_STATE_NONOPTIMIZED &&
+ h->state != TPGS_STATE_LBA_DEPENDENT) {
ret = BLKPREP_KILL;
req->cmd_flags |= REQ_QUIET;
}
next prev parent reply other threads:[~2010-09-22 10:13 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-17 19:05 [PATCH] scsi_dh_alua: add missing transitioning state support Mike Snitzer
2010-08-17 19:23 ` Nicholas A. Bellinger
2010-08-30 9:36 ` Hannes Reinecke
2010-08-31 15:11 ` Mike Snitzer
2010-09-20 15:35 ` Mike Snitzer
2010-09-21 2:27 ` Mike Christie
2010-09-21 2:28 ` Mike Christie
2010-09-21 19:33 ` Mike Snitzer
2010-09-21 21:14 ` Mike Christie
2010-09-22 10:13 ` Hannes Reinecke [this message]
2010-09-22 12:29 ` Mike Snitzer
2010-09-23 7:15 ` Hannes Reinecke
2010-09-23 13:44 ` Mike Snitzer
2010-09-23 18:53 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C99D6B3.3090003@suse.de \
--to=hare@suse.de \
--cc=James.Bottomley@suse.de \
--cc=linux-scsi@vger.kernel.org \
--cc=michaelc@cs.wisc.edu \
--cc=nab@linux-iscsi.org \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.