From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: [PATCH] sd: always retry READ CAPACITY for ALUA state transition Date: Fri, 01 May 2015 06:22:51 -0700 Message-ID: <1430486571.2192.1.camel@HansenPartnership.com> References: <1430127309-90412-1-git-send-email-hare@suse.de> <1430255925.2181.16.camel@HansenPartnership.com> <55421F70.80102@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from bedivere.hansenpartnership.com ([66.63.167.143]:43422 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752828AbbEANWx (ORCPT ); Fri, 1 May 2015 09:22:53 -0400 In-Reply-To: <55421F70.80102@suse.de> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Hannes Reinecke Cc: Christoph Hellwig , linux-scsi@vger.kernel.org On Thu, 2015-04-30 at 14:26 +0200, Hannes Reinecke wrote: > On 04/28/2015 11:18 PM, James Bottomley wrote: > > On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote: > >> During ALUA state transitions the device might return > >> a sense code 02/04/0a (Logical unit not accessible, asymmetric > >> access state transition). As this is a transient error > >> we should just retry the READ CAPACITY call until > >> the state transition finishes and the correct > >> capacity can be returned. > >> > >> Signed-off-by: Hannes Reinecke > >> --- > >> drivers/scsi/sd.c | 10 ++++++++++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > >> index 79beebf..7178b05 100644 > >> --- a/drivers/scsi/sd.c > >> +++ b/drivers/scsi/sd.c > >> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, > >> * give it one more chance */ > >> if (--reset_retries > 0) > >> continue; > >> + if (sense_valid && > >> + sshdr.sense_key == NOT_READY && > >> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) > >> + /* ALUA state transition; always retry */ > >> + continue; > >> } > >> retries--; > >> > >> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, > >> * give it one more chance */ > >> if (--reset_retries > 0) > >> continue; > >> + if (sense_valid && > >> + sshdr.sense_key == NOT_READY && > >> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) > >> + /* ALUA state transition; always retry */ > >> + continue; > >> } > >> retries--; > >> > > > > Got to say I really don't like this infinite retry possibility. How > > long does the ALUA transition take? Would increasing retries work (or > > even hijacking reset_retries)? > > > Well ... transitioning could be quite long (NetApp FAS has a > transition timeout of 30 _minutes_ ...). > But yeah, I could see to limit this somewhat. I think that might be a good idea. We can't hold this device (and the corresponding asynchronous probe thread) in a continuous loop for 30 minutes ... James