* [PATCH] sd: always retry READ CAPACITY for ALUA state transition
@ 2015-04-27 9:35 Hannes Reinecke
2015-04-28 21:18 ` James Bottomley
0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2015-04-27 9:35 UTC (permalink / raw)
To: James Bottomley; +Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke
During ALUA state transitions the device might return
a sense code 02/04/0a (Logical unit not accessible, asymmetric
access state transition). As this is a transient error
we should just retry the READ CAPACITY call until
the state transition finishes and the correct
capacity can be returned.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/scsi/sd.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 79beebf..7178b05 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
* give it one more chance */
if (--reset_retries > 0)
continue;
+ if (sense_valid &&
+ sshdr.sense_key == NOT_READY &&
+ sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+ /* ALUA state transition; always retry */
+ continue;
}
retries--;
@@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
* give it one more chance */
if (--reset_retries > 0)
continue;
+ if (sense_valid &&
+ sshdr.sense_key == NOT_READY &&
+ sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+ /* ALUA state transition; always retry */
+ continue;
}
retries--;
--
1.8.5.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] sd: always retry READ CAPACITY for ALUA state transition
2015-04-27 9:35 [PATCH] sd: always retry READ CAPACITY for ALUA state transition Hannes Reinecke
@ 2015-04-28 21:18 ` James Bottomley
2015-04-30 12:26 ` Hannes Reinecke
0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2015-04-28 21:18 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Christoph Hellwig, linux-scsi
On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote:
> During ALUA state transitions the device might return
> a sense code 02/04/0a (Logical unit not accessible, asymmetric
> access state transition). As this is a transient error
> we should just retry the READ CAPACITY call until
> the state transition finishes and the correct
> capacity can be returned.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
> drivers/scsi/sd.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 79beebf..7178b05 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> * give it one more chance */
> if (--reset_retries > 0)
> continue;
> + if (sense_valid &&
> + sshdr.sense_key == NOT_READY &&
> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
> + /* ALUA state transition; always retry */
> + continue;
> }
> retries--;
>
> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> * give it one more chance */
> if (--reset_retries > 0)
> continue;
> + if (sense_valid &&
> + sshdr.sense_key == NOT_READY &&
> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
> + /* ALUA state transition; always retry */
> + continue;
> }
> retries--;
>
Got to say I really don't like this infinite retry possibility. How
long does the ALUA transition take? Would increasing retries work (or
even hijacking reset_retries)?
James
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sd: always retry READ CAPACITY for ALUA state transition
2015-04-28 21:18 ` James Bottomley
@ 2015-04-30 12:26 ` Hannes Reinecke
2015-05-01 12:39 ` Martin George
2015-05-01 13:22 ` James Bottomley
0 siblings, 2 replies; 8+ messages in thread
From: Hannes Reinecke @ 2015-04-30 12:26 UTC (permalink / raw)
To: James Bottomley; +Cc: Christoph Hellwig, linux-scsi
On 04/28/2015 11:18 PM, James Bottomley wrote:
> On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote:
>> During ALUA state transitions the device might return
>> a sense code 02/04/0a (Logical unit not accessible, asymmetric
>> access state transition). As this is a transient error
>> we should just retry the READ CAPACITY call until
>> the state transition finishes and the correct
>> capacity can be returned.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>> drivers/scsi/sd.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 79beebf..7178b05 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>> * give it one more chance */
>> if (--reset_retries > 0)
>> continue;
>> + if (sense_valid &&
>> + sshdr.sense_key == NOT_READY &&
>> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
>> + /* ALUA state transition; always retry */
>> + continue;
>> }
>> retries--;
>>
>> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>> * give it one more chance */
>> if (--reset_retries > 0)
>> continue;
>> + if (sense_valid &&
>> + sshdr.sense_key == NOT_READY &&
>> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
>> + /* ALUA state transition; always retry */
>> + continue;
>> }
>> retries--;
>>
>
> Got to say I really don't like this infinite retry possibility. How
> long does the ALUA transition take? Would increasing retries work (or
> even hijacking reset_retries)?
>
Well ... transitioning could be quite long (NetApp FAS has a
transition timeout of 30 _minutes_ ...).
But yeah, I could see to limit this somewhat.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sd: always retry READ CAPACITY for ALUA state transition
2015-04-30 12:26 ` Hannes Reinecke
@ 2015-05-01 12:39 ` Martin George
2015-05-01 13:22 ` James Bottomley
1 sibling, 0 replies; 8+ messages in thread
From: Martin George @ 2015-05-01 12:39 UTC (permalink / raw)
To: Hannes Reinecke, James Bottomley; +Cc: Christoph Hellwig, linux-scsi
On 4/30/2015 5:56 PM, Hannes Reinecke wrote:
> On 04/28/2015 11:18 PM, James Bottomley wrote:
>> On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote:
>>> During ALUA state transitions the device might return
>>> a sense code 02/04/0a (Logical unit not accessible, asymmetric
>>> access state transition). As this is a transient error
>>> we should just retry the READ CAPACITY call until
>>> the state transition finishes and the correct
>>> capacity can be returned.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>> drivers/scsi/sd.c | 10 ++++++++++
>>> 1 file changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>>> index 79beebf..7178b05 100644
>>> --- a/drivers/scsi/sd.c
>>> +++ b/drivers/scsi/sd.c
>>> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>> * give it one more chance */
>>> if (--reset_retries > 0)
>>> continue;
>>> + if (sense_valid &&
>>> + sshdr.sense_key == NOT_READY &&
>>> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
>>> + /* ALUA state transition; always retry */
>>> + continue;
>>> }
>>> retries--;
>>>
>>> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>> * give it one more chance */
>>> if (--reset_retries > 0)
>>> continue;
>>> + if (sense_valid &&
>>> + sshdr.sense_key == NOT_READY &&
>>> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
>>> + /* ALUA state transition; always retry */
>>> + continue;
>>> }
>>> retries--;
>>>
>>
>> Got to say I really don't like this infinite retry possibility. How
>> long does the ALUA transition take? Would increasing retries work (or
>> even hijacking reset_retries)?
>>
> Well ... transitioning could be quite long (NetApp FAS has a
> transition timeout of 30 _minutes_ ...).
Well, actually NetApp FAS has a transition timeout of 2 minutes, and not
30 minutes - as reported in the IMPLICIT TRANSITION TIMEOUT value in the
extended RTPG data.
-Martin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sd: always retry READ CAPACITY for ALUA state transition
2015-04-30 12:26 ` Hannes Reinecke
2015-05-01 12:39 ` Martin George
@ 2015-05-01 13:22 ` James Bottomley
1 sibling, 0 replies; 8+ messages in thread
From: James Bottomley @ 2015-05-01 13:22 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Christoph Hellwig, linux-scsi
On Thu, 2015-04-30 at 14:26 +0200, Hannes Reinecke wrote:
> On 04/28/2015 11:18 PM, James Bottomley wrote:
> > On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote:
> >> During ALUA state transitions the device might return
> >> a sense code 02/04/0a (Logical unit not accessible, asymmetric
> >> access state transition). As this is a transient error
> >> we should just retry the READ CAPACITY call until
> >> the state transition finishes and the correct
> >> capacity can be returned.
> >>
> >> Signed-off-by: Hannes Reinecke <hare@suse.de>
> >> ---
> >> drivers/scsi/sd.c | 10 ++++++++++
> >> 1 file changed, 10 insertions(+)
> >>
> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> >> index 79beebf..7178b05 100644
> >> --- a/drivers/scsi/sd.c
> >> +++ b/drivers/scsi/sd.c
> >> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> >> * give it one more chance */
> >> if (--reset_retries > 0)
> >> continue;
> >> + if (sense_valid &&
> >> + sshdr.sense_key == NOT_READY &&
> >> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
> >> + /* ALUA state transition; always retry */
> >> + continue;
> >> }
> >> retries--;
> >>
> >> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> >> * give it one more chance */
> >> if (--reset_retries > 0)
> >> continue;
> >> + if (sense_valid &&
> >> + sshdr.sense_key == NOT_READY &&
> >> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
> >> + /* ALUA state transition; always retry */
> >> + continue;
> >> }
> >> retries--;
> >>
> >
> > Got to say I really don't like this infinite retry possibility. How
> > long does the ALUA transition take? Would increasing retries work (or
> > even hijacking reset_retries)?
> >
> Well ... transitioning could be quite long (NetApp FAS has a
> transition timeout of 30 _minutes_ ...).
> But yeah, I could see to limit this somewhat.
I think that might be a good idea. We can't hold this device (and the
corresponding asynchronous probe thread) in a continuous loop for 30
minutes ...
James
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] sd: always retry READ CAPACITY for ALUA state transition
@ 2017-10-17 7:11 Hannes Reinecke
2017-10-17 13:57 ` James Bottomley
0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2017-10-17 7:11 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke
During ALUA state transitions the device might return
a sense code 02/04/0a (Logical unit not accessible, asymmetric
access state transition). As this is a transient error
we should just retry the READ CAPACITY call until
the state transition finishes and the correct
capacity can be returned.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/scsi/sd.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 37daf9a..b4647f5 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2333,6 +2333,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
* give it one more chance */
if (--reset_retries > 0)
continue;
+ if (sense_valid &&
+ sshdr.sense_key == NOT_READY &&
+ sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+ /* ALUA state transition; always retry */
+ continue;
}
retries--;
@@ -2418,6 +2423,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
* give it one more chance */
if (--reset_retries > 0)
continue;
+ if (sense_valid &&
+ sshdr.sense_key == NOT_READY &&
+ sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+ /* ALUA state transition; always retry */
+ continue;
}
retries--;
--
1.8.5.6
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] sd: always retry READ CAPACITY for ALUA state transition
2017-10-17 7:11 Hannes Reinecke
@ 2017-10-17 13:57 ` James Bottomley
2017-10-18 5:54 ` Hannes Reinecke
0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2017-10-17 13:57 UTC (permalink / raw)
To: Hannes Reinecke, Martin K. Petersen; +Cc: Christoph Hellwig, linux-scsi
On Tue, 2017-10-17 at 09:11 +0200, Hannes Reinecke wrote:
> During ALUA state transitions the device might return
> a sense code 02/04/0a (Logical unit not accessible, asymmetric
> access state transition). As this is a transient error
> we should just retry the READ CAPACITY call until
> the state transition finishes and the correct
> capacity can be returned.
This will lock up the system if some ALUA initiator gets into a state
where it always returns transitioning and never completes, which
doesn't look like the best way to handle problem devices.
I thought after the ALUA transition the LUN gives a unit attention ...
can't you use that some way to trigger the capacity re-read, so do
asynchronous event notification instead of polling forever.
James
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sd: always retry READ CAPACITY for ALUA state transition
2017-10-17 13:57 ` James Bottomley
@ 2017-10-18 5:54 ` Hannes Reinecke
0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2017-10-18 5:54 UTC (permalink / raw)
To: James Bottomley, Martin K. Petersen; +Cc: Christoph Hellwig, linux-scsi
On 10/17/2017 03:57 PM, James Bottomley wrote:
> On Tue, 2017-10-17 at 09:11 +0200, Hannes Reinecke wrote:
>> During ALUA state transitions the device might return
>> a sense code 02/04/0a (Logical unit not accessible, asymmetric
>> access state transition). As this is a transient error
>> we should just retry the READ CAPACITY call until
>> the state transition finishes and the correct
>> capacity can be returned.
>
> This will lock up the system if some ALUA initiator gets into a state
> where it always returns transitioning and never completes, which
> doesn't look like the best way to handle problem devices.
>
> I thought after the ALUA transition the LUN gives a unit attention ...
> can't you use that some way to trigger the capacity re-read, so do
> asynchronous event notification instead of polling forever.
>
Hmm.
Will give it a try.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-10-18 5:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-27 9:35 [PATCH] sd: always retry READ CAPACITY for ALUA state transition Hannes Reinecke
2015-04-28 21:18 ` James Bottomley
2015-04-30 12:26 ` Hannes Reinecke
2015-05-01 12:39 ` Martin George
2015-05-01 13:22 ` James Bottomley
-- strict thread matches above, loose matches on Subject: below --
2017-10-17 7:11 Hannes Reinecke
2017-10-17 13:57 ` James Bottomley
2017-10-18 5:54 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).