* [PATCHv2] sd: retry READ CAPACITY for ALUA state transition
@ 2015-07-06 11:12 Hannes Reinecke
2015-07-06 15:13 ` Bart Van Assche
0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2015-07-06 11:12 UTC (permalink / raw)
To: James Bottomley; +Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke
During ALUA state transitions the device might return
a sense code 02/04/0a (Logical unit not accessible, asymmetric
access state transition). As this is a transient error
we should just retry the READ CAPACITY call after 1 second
until the state transition finishes and the correct
capacity can be returned.
At the same time we should break out of the loop after
2 minutes to avoid unbounded retries.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/scsi/sd.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3b2fcb4..f45b8fe 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1934,6 +1934,7 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
#endif
#define READ_CAPACITY_RETRIES_ON_RESET 10
+#define READ_CAPACITY_RETRIES_ON_TRANSITION 120
static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
unsigned char *buffer)
@@ -1943,6 +1944,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
int sense_valid = 0;
int the_result;
int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
+ int transition_retries = READ_CAPACITY_RETRIES_ON_TRANSITION;
unsigned int alignment;
unsigned long long lba;
unsigned sector_size;
@@ -1981,6 +1983,15 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
* give it one more chance */
if (--reset_retries > 0)
continue;
+ if (sense_valid &&
+ sshdr.sense_key == NOT_READY &&
+ sshdr.asc == 0x04 && sshdr.ascq == 0x0A) {
+ /* ALUA state transition; retry after delay */
+ if (--transition_retries > 0) {
+ msleep(1000);
+ continue;
+ }
+ }
}
retries--;
@@ -2039,6 +2050,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
int sense_valid = 0;
int the_result;
int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
+ int transition_retries = READ_CAPACITY_RETRIES_ON_TRANSITION;
sector_t lba;
unsigned sector_size;
@@ -2063,6 +2075,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
* give it one more chance */
if (--reset_retries > 0)
continue;
+ if (sense_valid &&
+ sshdr.sense_key == NOT_READY &&
+ sshdr.asc == 0x04 && sshdr.ascq == 0x0A) {
+ /* ALUA state transition; retry after delay */
+ if (--transition_retries > 0) {
+ msleep(1000);
+ continue;
+ }
+ }
}
retries--;
--
1.8.5.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCHv2] sd: retry READ CAPACITY for ALUA state transition
2015-07-06 11:12 [PATCHv2] sd: retry READ CAPACITY for ALUA state transition Hannes Reinecke
@ 2015-07-06 15:13 ` Bart Van Assche
2015-07-06 20:57 ` James Bottomley
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2015-07-06 15:13 UTC (permalink / raw)
To: Hannes Reinecke, James Bottomley; +Cc: Christoph Hellwig, linux-scsi
On 07/06/2015 04:12 AM, Hannes Reinecke wrote:
> During ALUA state transitions the device might return
> a sense code 02/04/0a (Logical unit not accessible, asymmetric
> access state transition). As this is a transient error
> we should just retry the READ CAPACITY call after 1 second
> until the state transition finishes and the correct
> capacity can be returned.
> At the same time we should break out of the loop after
> 2 minutes to avoid unbounded retries.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
> drivers/scsi/sd.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 3b2fcb4..f45b8fe 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1934,6 +1934,7 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
> #endif
>
> #define READ_CAPACITY_RETRIES_ON_RESET 10
> +#define READ_CAPACITY_RETRIES_ON_TRANSITION 120
>
> static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> unsigned char *buffer)
> @@ -1943,6 +1944,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> int sense_valid = 0;
> int the_result;
> int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
> + int transition_retries = READ_CAPACITY_RETRIES_ON_TRANSITION;
> unsigned int alignment;
> unsigned long long lba;
> unsigned sector_size;
> @@ -1981,6 +1983,15 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> * give it one more chance */
> if (--reset_retries > 0)
> continue;
> + if (sense_valid &&
> + sshdr.sense_key == NOT_READY &&
> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) {
> + /* ALUA state transition; retry after delay */
> + if (--transition_retries > 0) {
> + msleep(1000);
> + continue;
> + }
> + }
> }
> retries--;
>
> @@ -2039,6 +2050,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> int sense_valid = 0;
> int the_result;
> int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
> + int transition_retries = READ_CAPACITY_RETRIES_ON_TRANSITION;
> sector_t lba;
> unsigned sector_size;
>
> @@ -2063,6 +2075,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> * give it one more chance */
> if (--reset_retries > 0)
> continue;
> + if (sense_valid &&
> + sshdr.sense_key == NOT_READY &&
> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) {
> + /* ALUA state transition; retry after delay */
> + if (--transition_retries > 0) {
> + msleep(1000);
> + continue;
> + }
> + }
> }
> retries--;
Hello Hannes,
Although I agree that multipathd should handle arrays correctly that
fail READ CAPACITY commands while transitioning, seeing that a new
hard-coded timeout is added in the SCSI initiator code does not make me
enthusiast. The READ_CAPACITY_RETRIES_ON_TRANSITION timeout added
through this patch is independent of the IMPLICIT TRANSITION TIME in the
REPORT TARGET PORT GROUPS response. Has the following already been
considered ?
- If the capacity is not known, let the scsi_dh_alua handler submit a
READ CAPACITY command asynchronously after certain target port group
state transitions. Also let the scsi_dh_alua handler submit a
notification to user space after the capacity changes from "unknown" to
"known".
- Let multipathd ignore paths for which READ CAPACITY failed until the
capacity becomes known.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] sd: retry READ CAPACITY for ALUA state transition
2015-07-06 15:13 ` Bart Van Assche
@ 2015-07-06 20:57 ` James Bottomley
2015-07-07 6:18 ` Hannes Reinecke
0 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2015-07-06 20:57 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Hannes Reinecke, Christoph Hellwig, linux-scsi
On Mon, 2015-07-06 at 08:13 -0700, Bart Van Assche wrote:
> On 07/06/2015 04:12 AM, Hannes Reinecke wrote:
> > During ALUA state transitions the device might return
> > a sense code 02/04/0a (Logical unit not accessible, asymmetric
> > access state transition). As this is a transient error
> > we should just retry the READ CAPACITY call after 1 second
> > until the state transition finishes and the correct
> > capacity can be returned.
> > At the same time we should break out of the loop after
> > 2 minutes to avoid unbounded retries.
> >
> > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > ---
> > drivers/scsi/sd.c | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > index 3b2fcb4..f45b8fe 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -1934,6 +1934,7 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
> > #endif
> >
> > #define READ_CAPACITY_RETRIES_ON_RESET 10
> > +#define READ_CAPACITY_RETRIES_ON_TRANSITION 120
> >
> > static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> > unsigned char *buffer)
> > @@ -1943,6 +1944,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> > int sense_valid = 0;
> > int the_result;
> > int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
> > + int transition_retries = READ_CAPACITY_RETRIES_ON_TRANSITION;
> > unsigned int alignment;
> > unsigned long long lba;
> > unsigned sector_size;
> > @@ -1981,6 +1983,15 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> > * give it one more chance */
> > if (--reset_retries > 0)
> > continue;
> > + if (sense_valid &&
> > + sshdr.sense_key == NOT_READY &&
> > + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) {
> > + /* ALUA state transition; retry after delay */
> > + if (--transition_retries > 0) {
> > + msleep(1000);
> > + continue;
> > + }
> > + }
> > }
> > retries--;
> >
> > @@ -2039,6 +2050,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> > int sense_valid = 0;
> > int the_result;
> > int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
> > + int transition_retries = READ_CAPACITY_RETRIES_ON_TRANSITION;
> > sector_t lba;
> > unsigned sector_size;
> >
> > @@ -2063,6 +2075,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> > * give it one more chance */
> > if (--reset_retries > 0)
> > continue;
> > + if (sense_valid &&
> > + sshdr.sense_key == NOT_READY &&
> > + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) {
> > + /* ALUA state transition; retry after delay */
> > + if (--transition_retries > 0) {
> > + msleep(1000);
> > + continue;
> > + }
> > + }
> > }
> > retries--;
>
> Hello Hannes,
>
> Although I agree that multipathd should handle arrays correctly that
> fail READ CAPACITY commands while transitioning, seeing that a new
> hard-coded timeout is added in the SCSI initiator code does not make me
> enthusiast. The READ_CAPACITY_RETRIES_ON_TRANSITION timeout added
> through this patch is independent of the IMPLICIT TRANSITION TIME in the
> REPORT TARGET PORT GROUPS response. Has the following already been
> considered ?
> - If the capacity is not known, let the scsi_dh_alua handler submit a
> READ CAPACITY command asynchronously after certain target port group
> state transitions. Also let the scsi_dh_alua handler submit a
> notification to user space after the capacity changes from "unknown" to
> "known".
> - Let multipathd ignore paths for which READ CAPACITY failed until the
> capacity becomes known.
Yes, I'm with something like this. The patch you propose will be a busy
wait for userspace which they may not want.
James
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] sd: retry READ CAPACITY for ALUA state transition
2015-07-06 20:57 ` James Bottomley
@ 2015-07-07 6:18 ` Hannes Reinecke
2015-07-07 20:48 ` Bart Van Assche
0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2015-07-07 6:18 UTC (permalink / raw)
To: James Bottomley, Bart Van Assche; +Cc: Christoph Hellwig, linux-scsi
On 07/06/2015 10:57 PM, James Bottomley wrote:
> On Mon, 2015-07-06 at 08:13 -0700, Bart Van Assche wrote:
>> On 07/06/2015 04:12 AM, Hannes Reinecke wrote:
>>> During ALUA state transitions the device might return
>>> a sense code 02/04/0a (Logical unit not accessible, asymmetric
>>> access state transition). As this is a transient error
>>> we should just retry the READ CAPACITY call after 1 second
>>> until the state transition finishes and the correct
>>> capacity can be returned.
>>> At the same time we should break out of the loop after
>>> 2 minutes to avoid unbounded retries.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>> drivers/scsi/sd.c | 21 +++++++++++++++++++++
>>> 1 file changed, 21 insertions(+)
>>>
>>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>>> index 3b2fcb4..f45b8fe 100644
>>> --- a/drivers/scsi/sd.c
>>> +++ b/drivers/scsi/sd.c
>>> @@ -1934,6 +1934,7 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>> #endif
>>>
>>> #define READ_CAPACITY_RETRIES_ON_RESET 10
>>> +#define READ_CAPACITY_RETRIES_ON_TRANSITION 120
>>>
>>> static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>> unsigned char *buffer)
>>> @@ -1943,6 +1944,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>> int sense_valid = 0;
>>> int the_result;
>>> int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
>>> + int transition_retries = READ_CAPACITY_RETRIES_ON_TRANSITION;
>>> unsigned int alignment;
>>> unsigned long long lba;
>>> unsigned sector_size;
>>> @@ -1981,6 +1983,15 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>> * give it one more chance */
>>> if (--reset_retries > 0)
>>> continue;
>>> + if (sense_valid &&
>>> + sshdr.sense_key == NOT_READY &&
>>> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) {
>>> + /* ALUA state transition; retry after delay */
>>> + if (--transition_retries > 0) {
>>> + msleep(1000);
>>> + continue;
>>> + }
>>> + }
>>> }
>>> retries--;
>>>
>>> @@ -2039,6 +2050,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>> int sense_valid = 0;
>>> int the_result;
>>> int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
>>> + int transition_retries = READ_CAPACITY_RETRIES_ON_TRANSITION;
>>> sector_t lba;
>>> unsigned sector_size;
>>>
>>> @@ -2063,6 +2075,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>> * give it one more chance */
>>> if (--reset_retries > 0)
>>> continue;
>>> + if (sense_valid &&
>>> + sshdr.sense_key == NOT_READY &&
>>> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) {
>>> + /* ALUA state transition; retry after delay */
>>> + if (--transition_retries > 0) {
>>> + msleep(1000);
>>> + continue;
>>> + }
>>> + }
>>> }
>>> retries--;
>>
>> Hello Hannes,
>>
>> Although I agree that multipathd should handle arrays correctly that
>> fail READ CAPACITY commands while transitioning, seeing that a new
>> hard-coded timeout is added in the SCSI initiator code does not make me
>> enthusiast. The READ_CAPACITY_RETRIES_ON_TRANSITION timeout added
>> through this patch is independent of the IMPLICIT TRANSITION TIME in the
>> REPORT TARGET PORT GROUPS response. Has the following already been
>> considered ?
>> - If the capacity is not known, let the scsi_dh_alua handler submit a
>> READ CAPACITY command asynchronously after certain target port group
>> state transitions. Also let the scsi_dh_alua handler submit a
>> notification to user space after the capacity changes from "unknown" to
>> "known".
I'd rather prefer to have this handled with the existing 'rescan'
functionality. IE if the path becomes available (ie returns a state
of 'optimal' or 'non-optimal' scsi_dh_alua will be invoking 'rescan'
on that device, which should cause the capacity to be updated.
>> - Let multipathd ignore paths for which READ CAPACITY failed until the
>> capacity becomes known.
I've already sent patches to multipath to simply ignore the results
from a READ CAPACITY; so this is not an issue.
Paths with a capacity of '0' will be happily accepted as multipath
devices with path status 'standby'.
>
> Yes, I'm with something like this. The patch you propose will be a busy
> wait for userspace which they may not want.
>
I'm fully aware that this patch will induce a busy wait if the
device happens to be in transitioning at that time.
This patch is just a simple solution to this problem.
The 'correct' solution is actually a bit more involved, as it would
require the device handler to keep track of the multipath topology:
Currently the device handler are per ITL nexus, and invoked on
activation only (ie for explicit ALUA), so the ALUA state only
reflect the state after the last call to 'activate'.
I have a patchset for moving scsi_dh_alua onto a workqueue, so that
it'll be able to keep track of the ALUA state per TL nexus (note the
missing 'I'). This will not only reduce the number of RTPGs send to
the target (scsi_dh_rdac already does this), but will also allow
scsi_dh_alua to keep track of the ALUA state of the relative target
port of the target port group.
However, to handle the above case correctly we would need to keep
track of the entire multipath topology, to figure out which devices
belong to that relative target port and might need to be updated
(there might be several paths in standby, and we will have sent the
RTPG only for one of them).
Patches for that are not done yet, so I thought the above patch
would be a simple stop-gap measure.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] sd: retry READ CAPACITY for ALUA state transition
2015-07-07 6:18 ` Hannes Reinecke
@ 2015-07-07 20:48 ` Bart Van Assche
2015-07-08 6:20 ` Hannes Reinecke
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2015-07-07 20:48 UTC (permalink / raw)
To: Hannes Reinecke, James Bottomley
Cc: Christoph Hellwig, linux-scsi@vger.kernel.org
On 07/06/2015 11:18 PM, Hannes Reinecke wrote:
> However, to handle the above case correctly we would need to keep
> track of the entire multipath topology, to figure out which devices
> belong to that relative target port and might need to be updated
> (there might be several paths in standby, and we will have sent the
> RTPG only for one of them).
> Patches for that are not done yet, so I thought the above patch
> would be a simple stop-gap measure.
Hello Hannes,
Are you sure that keeping track of the entire multipath topology would
be required to implement what I proposed ? In the patch "scsi_dh_alua:
Use separate alua_port_group structure"
(http://thread.gmane.org/gmane.linux.scsi/101388/focus=101380) I see
that the new scsi_dh_alua code keeps track of the target port group
(TPG) ID and relative target port (RTP) ID. As you know this information
can be queried for each LUN via the Device Identification VPD page. How
about caching the TPG and RTP IDs per LUN such that the scsi_dh_alua
code can figure out which LUNs are associated with which target ports by
iterating over the known LUNs ?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] sd: retry READ CAPACITY for ALUA state transition
2015-07-07 20:48 ` Bart Van Assche
@ 2015-07-08 6:20 ` Hannes Reinecke
0 siblings, 0 replies; 6+ messages in thread
From: Hannes Reinecke @ 2015-07-08 6:20 UTC (permalink / raw)
To: Bart Van Assche, James Bottomley
Cc: Christoph Hellwig, linux-scsi@vger.kernel.org
On 07/07/2015 10:48 PM, Bart Van Assche wrote:
> On 07/06/2015 11:18 PM, Hannes Reinecke wrote:
>> However, to handle the above case correctly we would need to keep
>> track of the entire multipath topology, to figure out which devices
>> belong to that relative target port and might need to be updated
>> (there might be several paths in standby, and we will have sent the
>> RTPG only for one of them).
>> Patches for that are not done yet, so I thought the above patch
>> would be a simple stop-gap measure.
>
> Hello Hannes,
>
> Are you sure that keeping track of the entire multipath topology
> would be required to implement what I proposed ? In the patch
> "scsi_dh_alua: Use separate alua_port_group structure"
> (http://thread.gmane.org/gmane.linux.scsi/101388/focus=101380) I see
> that the new scsi_dh_alua code keeps track of the target port group
> (TPG) ID and relative target port (RTP) ID. As you know this
> information can be queried for each LUN via the Device
> Identification VPD page. How about caching the TPG and RTP IDs per
> LUN such that the scsi_dh_alua code can figure out which LUNs are
> associated with which target ports by iterating over the known LUNs ?
>
I did intentionally _not_ store a pointer to the attached LUN in the
alua_port_group structure, as this would induce yet another
potential race condition when devices are being removed.
But meanwhile I've come up with a simpler patch which handles this
rather elegantly (methinks :-), and doesn't require any additional
infrastructure.
I'll be posting it shortly.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-07-08 6:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-06 11:12 [PATCHv2] sd: retry READ CAPACITY for ALUA state transition Hannes Reinecke
2015-07-06 15:13 ` Bart Van Assche
2015-07-06 20:57 ` James Bottomley
2015-07-07 6:18 ` Hannes Reinecke
2015-07-07 20:48 ` Bart Van Assche
2015-07-08 6:20 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).