[PATCH] scsi_error: do not allow IO errors with certain ILLEGAL

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
@ 2011-12-02 20:31 Mike Snitzer
  2011-12-02 21:04 ` James Bottomley
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mike Snitzer @ 2011-12-02 20:31 UTC (permalink / raw)
  To: linux-scsi
  Cc: James Bottomley, Hannes Reinecke, Martin K. Petersen,
	Mike Snitzer

Thin provisioned LUNs from multiple array vendors have failed WRITE SAME
(16) w/ UNMAP bit set with ILLEGAL_REQUEST sense.  With additional sense
0x24 and 0x26 respectively.

In both instances the target would always fail the CDB no matter how
many retries were performed (permanent target failure rather than
transient path failure).  This resulted in mkfs.ext4's discard of a
multipath device looping indefinitely while failing paths.

Additional sense 0x20 and 0x21 were included because both represent
TARGET_ERRORs too.  But not all ILLEGAL_REQUEST sense can be treated as
a TARGET_ERROR.

One example where a retry might be desirable is described in SBC-3, in
the "4.18 Model for XOR commands"

"When the device server is not able to accept a new command because
there is not enough space in the buffer, the device server shall
terminate that command with CHECK CONDITION status with the sense key
set to ILLEGAL REQUEST and the additional sense code set to BUFFER
FULL."

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/scsi/scsi_error.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index dc6131e..33d3a50 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -366,6 +366,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
 			return TARGET_ERROR;
 
 	case ILLEGAL_REQUEST:
+		if (sshdr.asc == 0x20 || /* Invalid command operation code */
+		    sshdr.asc == 0x21 || /* Logical block address out of range */
+		    sshdr.asc == 0x24 || /* Invalid field in cdb */
+		    sshdr.asc == 0x26) { /* Parameter value invalid */
+			return TARGET_ERROR;
+		}
+		return SUCCESS;
+
 	default:
 		return SUCCESS;
 	}
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2011-12-02 20:31 [PATCH] scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable Mike Snitzer
@ 2011-12-02 21:04 ` James Bottomley
  2011-12-02 22:04   ` Mike Snitzer
  2011-12-06 21:07 ` [PATCH] " Martin K. Petersen
  2012-02-13 23:35 ` [PATCH v2] [SCSI] scsi_error: classify some ILLEGAL_REQUEST sense as a permanent TARGET_ERROR Mike Snitzer
  2 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2011-12-02 21:04 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-scsi, Hannes Reinecke, Martin K. Petersen

On Fri, 2011-12-02 at 15:31 -0500, Mike Snitzer wrote:
> Thin provisioned LUNs from multiple array vendors have failed WRITE SAME
> (16) w/ UNMAP bit set with ILLEGAL_REQUEST sense.  With additional sense
> 0x24 and 0x26 respectively.
> 
> In both instances the target would always fail the CDB no matter how
> many retries were performed (permanent target failure rather than
> transient path failure).  This resulted in mkfs.ext4's discard of a
> multipath device looping indefinitely while failing paths.

I don't quite understand this analysis.  ILLEGAL_REQUEST currently
always returns SUCCESS from scsi_check_sense().  That return is
propagated up to scsi_decide_disposition() which causes I/O completion.
We do have another gate for ILLEGAL_REQUEST in scsi_io_completion()
which can retry, but only if it's downshifting the command from _10 to
_6 ... so I don't get where you think the looping is coming from ... the
net effect of your patch is to change the error passed on to the block
layer in blk_end_request() from -EIO to -EREMOTEIO.  So it sounds like
if there is a retry problem it's above SCSI?

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2011-12-02 21:04 ` James Bottomley
@ 2011-12-02 22:04   ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2011-12-02 22:04 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Martin K. Petersen

On Fri, Dec 02 2011 at  4:04pm -0500,
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Fri, 2011-12-02 at 15:31 -0500, Mike Snitzer wrote:
> > Thin provisioned LUNs from multiple array vendors have failed WRITE SAME
> > (16) w/ UNMAP bit set with ILLEGAL_REQUEST sense.  With additional sense
> > 0x24 and 0x26 respectively.
> > 
> > In both instances the target would always fail the CDB no matter how
> > many retries were performed (permanent target failure rather than
> > transient path failure).  This resulted in mkfs.ext4's discard of a
> > multipath device looping indefinitely while failing paths.
> 
> I don't quite understand this analysis.  ILLEGAL_REQUEST currently
> always returns SUCCESS from scsi_check_sense().  That return is
> propagated up to scsi_decide_disposition() which causes I/O completion.
> We do have another gate for ILLEGAL_REQUEST in scsi_io_completion()
> which can retry, but only if it's downshifting the command from _10 to
> _6 ... so I don't get where you think the looping is coming from ... the
> net effect of your patch is to change the error passed on to the block
> layer in blk_end_request() from -EIO to -EREMOTEIO.  So it sounds like
> if there is a retry problem it's above SCSI?

Exactly, the looping is in dm-multipath.  Because scsi_check_sense() is
returning SUCCESS for these ILLEGAL_REQUEST, multipath is retrying the
discard after failing the path that the request just failed on.
Previously failed paths are recovered in time for the next retry of the
discard that will _always_ fail... and so the cycle goes (and mkfs.ext4
appears hung).

commit 63583cca745f440 ([SCSI] Add detailed SCSI I/O errors) enabled
mpath to immediately return target errors (-EREMOTEIO) without retry
after path failure -- so the change to scsi_check_sense() is selectively
returning TARGET_ERROR for ILLEGAL REQUEST; which will result in
-EREMOTEIO.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2011-12-02 20:31 [PATCH] scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable Mike Snitzer
  2011-12-02 21:04 ` James Bottomley
@ 2011-12-06 21:07 ` Martin K. Petersen
  2011-12-06 21:27   ` Mike Snitzer
  2012-02-13 23:35 ` [PATCH v2] [SCSI] scsi_error: classify some ILLEGAL_REQUEST sense as a permanent TARGET_ERROR Mike Snitzer
  2 siblings, 1 reply; 15+ messages in thread
From: Martin K. Petersen @ 2011-12-06 21:07 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-scsi, James Bottomley, Hannes Reinecke, Martin K. Petersen

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> Thin provisioned LUNs from multiple array vendors have failed
Mike> WRITE SAME(16) w/ UNMAP bit set with ILLEGAL_REQUEST sense.

Can you share what these arrays are returning in block limits and the
logical block provisioning VPD? It sounds like they are returning
inconsistent data causing us to issue the wrong command...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2011-12-06 21:07 ` [PATCH] " Martin K. Petersen
@ 2011-12-06 21:27   ` Mike Snitzer
  2011-12-06 22:03     ` Martin K. Petersen
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2011-12-06 21:27 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James Bottomley, Hannes Reinecke

On Tue, Dec 06 2011 at  4:07pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> Thin provisioned LUNs from multiple array vendors have failed
> Mike> WRITE SAME(16) w/ UNMAP bit set with ILLEGAL_REQUEST sense.
> 
> Can you share what these arrays are returning in block limits and the
> logical block provisioning VPD? It sounds like they are returning
> inconsistent data causing us to issue the wrong command...

I don't have that info.  But I can ask for it.

In the first instance the array had discard alignment constraints
(needed to be a multiple of 4k, vendor has since relaxed that).  For the
second case, the reason for why the array failed the CDB hasn't been
shared yet (other than it was an "Invalid field in cdb").

Regardless, shouldn't the SCSI midlayer classify such ILLEGAL_REQUEST
sense, with an add. sense I listed in the patch, as a target error?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2011-12-06 21:27   ` Mike Snitzer
@ 2011-12-06 22:03     ` Martin K. Petersen
  2011-12-06 22:42       ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Martin K. Petersen @ 2011-12-06 22:03 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James Bottomley, Hannes Reinecke

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> Regardless, shouldn't the SCSI midlayer classify such
Mike> ILLEGAL_REQUEST sense, with an add. sense I listed in the patch,
Mike> as a target error?

Well, even SUCCESS should cause the I/O to be aborted.

I assume this is the RHEL6 kernel? Did you backport my provisioning
updates that brings the heuristics in sync with SBC-3 (#c98a0e)?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2011-12-06 22:03     ` Martin K. Petersen
@ 2011-12-06 22:42       ` Mike Snitzer
  2012-02-13 16:29         ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2011-12-06 22:42 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James Bottomley, Hannes Reinecke

On Tue, Dec 06 2011 at  5:03pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> Regardless, shouldn't the SCSI midlayer classify such
> Mike> ILLEGAL_REQUEST sense, with an add. sense I listed in the patch,
> Mike> as a target error?
> 
> Well, even SUCCESS should cause the I/O to be aborted.

As I replied to James, yes SUCCESS does cause the IO to fail, but the
discard gets retried by multipath.

Returning TARGET_ERROR enables the block layer to return -EREMOTEIO
which multipath will immediately pass up (rather than the normal fail
path and retry).
 
> I assume this is the RHEL6 kernel? Did you backport my provisioning
> updates that brings the heuristics in sync with SBC-3 (#c98a0e)?

Yes, that update was pulled in to RHEL6.2 (released today).  But this
issue is a concern for both upstream and RHEL6 (and any other distro
with a recent kernel).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2011-12-06 22:42       ` Mike Snitzer
@ 2012-02-13 16:29         ` Mike Snitzer
  2012-02-13 17:53           ` Martin K. Petersen
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2012-02-13 16:29 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James Bottomley, Hannes Reinecke, linux-kernel

On Tue, Dec 06 2011 at  5:42pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Dec 06 2011 at  5:03pm -0500,
> Martin K. Petersen <martin.petersen@oracle.com> wrote:
> 
> > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> > 
> > Mike> Regardless, shouldn't the SCSI midlayer classify such
> > Mike> ILLEGAL_REQUEST sense, with an add. sense I listed in the patch,
> > Mike> as a target error?
> > 
> > Well, even SUCCESS should cause the I/O to be aborted.
> 
> As I replied to James, yes SUCCESS does cause the IO to fail, but the
> discard gets retried by multipath.
> 
> Returning TARGET_ERROR enables the block layer to return -EREMOTEIO
> which multipath will immediately pass up (rather than the normal fail
> path and retry).
>  
> > I assume this is the RHEL6 kernel? Did you backport my provisioning
> > updates that brings the heuristics in sync with SBC-3 (#c98a0e)?
> 
> Yes, that update was pulled in to RHEL6.2 (released today).  But this
> issue is a concern for both upstream and RHEL6 (and any other distro
> with a recent kernel).

Hey Martin and James,

(for the benefit of others, original proposed fix is here):
http://www.spinics.net/lists/linux-scsi/msg55792.html

I've had 2 additional reports from different storage vendors that they
(or their customers) are having Linux (RHEL6) installation failures.
The reason is their storage is broken:
- they set TPE in the storage target's READ CAPACITY response
- but they only support UNMAP (not WRITE SAME w/ UNMAP bit set)
  - but they don't properly populate the BLOCK LIMITS VPD; so Linux
    defaults to using WRITE SAME w/ UNMAP bit set.

So that makes 3 different _prominent_ storage vendors, that I am aware
of, that are bitten by their broken storage (relative to discard and
properly advertising which variant they actually support).  I'd much
rather deal with the storage vendors (or their customers) reporting that
discards aren't working than mutual customers reporting that they cannot
even install to the storage.

The ultimate fix is clear: storage vendors need to fix their storage
(2 of the 3 have, 1 is working on it).  But a Linux-only workaround for
this series of unfortunate events (particularly as it happens with
multipath in the mix) is to have SCSI classify certain ILLEGAL_REQUEST
as the TARGET_ERROR that they are.

I would very much appreciate this fix making its way to Linux 3.3 (even
though we're past the merge window this is a pure fix).

It will allow Linux to cope with vendors' storage that is broken.

Please advise, Ack, submit for upstream 3.3 inclusion, etc... thanks! :)

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2012-02-13 16:29         ` Mike Snitzer
@ 2012-02-13 17:53           ` Martin K. Petersen
  2012-02-13 18:13             ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Martin K. Petersen @ 2012-02-13 17:53 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James Bottomley, Hannes Reinecke,
	linux-kernel

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> So that makes 3 different _prominent_ storage vendors, that I am
Mike> aware of, that are bitten by their broken storage (relative to
Mike> discard and properly advertising which variant they actually
Mike> support).  I'd much rather deal with the storage vendors (or their
Mike> customers) reporting that discards aren't working than mutual
Mike> customers reporting that they cannot even install to the storage.

More graceful handling of the sense data aside, we do have a couple of
options:

 1. Now that the provisioning portion seems to be stable in SBC-3 we can
    nuke the interim spec heuristics and only support devices that
    report the right thing. This may disable provisioning for some
    existing users whose arrays run non-compliant firmware.

 2. We can add another layer of heuristics based on the RSOC wrapper I
    introduced for write same. Maybe you could send me sg_opcodes output
    for the arrays in question?

Mike> The ultimate fix is clear: storage vendors need to fix their
Mike> storage (2 of the 3 have, 1 is working on it).  But a Linux-only
Mike> workaround for this series of unfortunate events (particularly as
Mike> it happens with multipath in the mix) is to have SCSI classify
Mike> certain ILLEGAL_REQUEST as the TARGET_ERROR that they are.

I don't have a fundamental problem with your patch. But since we
explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
what's broken? We should disable discard support if the WRITE SAME w/
UNMAP fails.

I'll see if I can figure out what's going on unless you beat me to it...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2012-02-13 17:53           ` Martin K. Petersen
@ 2012-02-13 18:13             ` Mike Snitzer
  2012-02-13 18:59               ` Mike Snitzer
  2012-02-13 19:16               ` Martin K. Petersen
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2012-02-13 18:13 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James Bottomley, Hannes Reinecke, linux-kernel

On Mon, Feb 13 2012 at 12:53pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> So that makes 3 different _prominent_ storage vendors, that I am
> Mike> aware of, that are bitten by their broken storage (relative to
> Mike> discard and properly advertising which variant they actually
> Mike> support).  I'd much rather deal with the storage vendors (or their
> Mike> customers) reporting that discards aren't working than mutual
> Mike> customers reporting that they cannot even install to the storage.
> 
> More graceful handling of the sense data aside, we do have a couple of
> options:
> 
>  1. Now that the provisioning portion seems to be stable in SBC-3 we can
>     nuke the interim spec heuristics and only support devices that
>     report the right thing. This may disable provisioning for some
>     existing users whose arrays run non-compliant firmware.
> 
>  2. We can add another layer of heuristics based on the RSOC wrapper I
>     introduced for write same. Maybe you could send me sg_opcodes output
>     for the arrays in question?

Yeah, I think that would be welcomed evolution (but as you say,
independent of improving additional ILLEGAL REQUEST processing).

> Mike> The ultimate fix is clear: storage vendors need to fix their
> Mike> storage (2 of the 3 have, 1 is working on it).  But a Linux-only
> Mike> workaround for this series of unfortunate events (particularly as
> Mike> it happens with multipath in the mix) is to have SCSI classify
> Mike> certain ILLEGAL_REQUEST as the TARGET_ERROR that they are.
> 
> I don't have a fundamental problem with your patch. But since we
> explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
> what's broken? We should disable discard support if the WRITE SAME w/
> UNMAP fails.

Yeah, I thought the disabling would be sufficient too.  But
unfortunately multipath doesn't inspect the request it is retrying
(after it fails the path the request just failed on).  So even though
discards get disabled: the first discard (which caused discards to
become disabled) is still in-flight and keeps getting retried
indefinitely by the multipath layer (if the paths recover quickly).

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2012-02-13 18:13             ` Mike Snitzer
@ 2012-02-13 18:59               ` Mike Snitzer
  2012-02-13 19:16               ` Martin K. Petersen
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2012-02-13 18:59 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James Bottomley, Hannes Reinecke, linux-kernel

On Mon, Feb 13 2012 at  1:13pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Feb 13 2012 at 12:53pm -0500,
> Martin K. Petersen <martin.petersen@oracle.com> wrote:
> 
> > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> > 
> > Mike> So that makes 3 different _prominent_ storage vendors, that I am
> > Mike> aware of, that are bitten by their broken storage (relative to
> > Mike> discard and properly advertising which variant they actually
> > Mike> support).  I'd much rather deal with the storage vendors (or their
> > Mike> customers) reporting that discards aren't working than mutual
> > Mike> customers reporting that they cannot even install to the storage.
> > 
> > More graceful handling of the sense data aside, we do have a couple of
> > options:
> > 
> >  1. Now that the provisioning portion seems to be stable in SBC-3 we can
> >     nuke the interim spec heuristics and only support devices that
> >     report the right thing. This may disable provisioning for some
> >     existing users whose arrays run non-compliant firmware.
> > 
> >  2. We can add another layer of heuristics based on the RSOC wrapper I
> >     introduced for write same. Maybe you could send me sg_opcodes output
> >     for the arrays in question?
> 
> Yeah, I think that would be welcomed evolution (but as you say,
> independent of improving additional ILLEGAL REQUEST processing).

That was a response to 1 above.

I don't have direct access to the arrays in question to get sg_opcodes.

But I can work on getting them.

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2012-02-13 18:13             ` Mike Snitzer
  2012-02-13 18:59               ` Mike Snitzer
@ 2012-02-13 19:16               ` Martin K. Petersen
  2012-02-13 19:36                 ` Mike Snitzer
  1 sibling, 1 reply; 15+ messages in thread
From: Martin K. Petersen @ 2012-02-13 19:16 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James Bottomley, Hannes Reinecke,
	linux-kernel

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

>> I don't have a fundamental problem with your patch. But since we
>> explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
>> what's broken? We should disable discard support if the WRITE SAME w/
>> UNMAP fails.

Mike> Yeah, I thought the disabling would be sufficient too.  But
Mike> unfortunately multipath doesn't inspect the request it is retrying
Mike> (after it fails the path the request just failed on).

Well, we shouldn't be returning something that multipath should ever act
on.

I think I understand what's going on. Can you try the following patch?

-- 
Martin K. Petersen	Oracle Linux Engineering


diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b2c95db..4e8d0b6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -879,6 +879,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				    cmd->cmnd[0] == WRITE_SAME_16 ||
 				    cmd->cmnd[0] == WRITE_SAME)) {
 				description = "Discard failure";
+				error = -EREMOTEIO;
 				action = ACTION_FAIL;
 			} else
 				action = ACTION_FAIL;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c691fb5..5f0d383 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -497,6 +497,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 		max(sdkp->physical_block_size,
 		    sdkp->unmap_granularity * logical_block_size);
 
+	sdkp->provisioning_mode = mode;
+
 	switch (mode) {
 
 	case SD_LBP_DISABLE:
@@ -524,8 +526,6 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 
 	q->limits.max_discard_sectors = max_blocks * (logical_block_size >> 9);
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
-
-	sdkp->provisioning_mode = mode;
 }
 
 /**

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2012-02-13 19:16               ` Martin K. Petersen
@ 2012-02-13 19:36                 ` Mike Snitzer
  2012-02-13 19:38                   ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2012-02-13 19:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James Bottomley, Hannes Reinecke, linux-kernel

On Mon, Feb 13 2012 at  2:16pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> >> I don't have a fundamental problem with your patch. But since we
> >> explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
> >> what's broken? We should disable discard support if the WRITE SAME w/
> >> UNMAP fails.
> 
> Mike> Yeah, I thought the disabling would be sufficient too.  But
> Mike> unfortunately multipath doesn't inspect the request it is retrying
> Mike> (after it fails the path the request just failed on).
> 
> Well, we shouldn't be returning something that multipath should ever act
> on.
> 
> I think I understand what's going on. Can you try the following patch?

Looks good to me (small nit below), it'll solve the immediate problem,
I'll pass it on.  Please add my:

Acked-by: Mike Snitzer <snitzer@redhat.com>

But I also think establishing a baseline of TARGET_ERROR for certain
ILLEGAL REQUEST is still sane and should go in too...

Thanks,
Mike

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b2c95db..4e8d0b6 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -879,6 +879,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
>  				    cmd->cmnd[0] == WRITE_SAME_16 ||
>  				    cmd->cmnd[0] == WRITE_SAME)) {
>  				description = "Discard failure";
> +				error = -EREMOTEIO;
>  				action = ACTION_FAIL;

Previous DIX -EILSEQ code block sets error after action.  Should follow
that order here?  Purely an aesthetics thing.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable
  2012-02-13 19:36                 ` Mike Snitzer
@ 2012-02-13 19:38                   ` James Bottomley
  0 siblings, 0 replies; 15+ messages in thread
From: James Bottomley @ 2012-02-13 19:38 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, Hannes Reinecke, linux-kernel

On Mon, 2012-02-13 at 14:36 -0500, Mike Snitzer wrote:
> On Mon, Feb 13 2012 at  2:16pm -0500,
> Martin K. Petersen <martin.petersen@oracle.com> wrote:
> 
> > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> > 
> > >> I don't have a fundamental problem with your patch. But since we
> > >> explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
> > >> what's broken? We should disable discard support if the WRITE SAME w/
> > >> UNMAP fails.
> > 
> > Mike> Yeah, I thought the disabling would be sufficient too.  But
> > Mike> unfortunately multipath doesn't inspect the request it is retrying
> > Mike> (after it fails the path the request just failed on).
> > 
> > Well, we shouldn't be returning something that multipath should ever act
> > on.
> > 
> > I think I understand what's going on. Can you try the following patch?
> 
> Looks good to me (small nit below), it'll solve the immediate problem,
> I'll pass it on.  Please add my:
> 
> Acked-by: Mike Snitzer <snitzer@redhat.com>
> 
> But I also think establishing a baseline of TARGET_ERROR for certain
> ILLEGAL REQUEST is still sane and should go in too...

So someone still needs to package up the final agreed version with a
nice changelog and send it to the list ...

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] [SCSI] scsi_error: classify some ILLEGAL_REQUEST sense as a permanent TARGET_ERROR
  2011-12-02 20:31 [PATCH] scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable Mike Snitzer
  2011-12-02 21:04 ` James Bottomley
  2011-12-06 21:07 ` [PATCH] " Martin K. Petersen
@ 2012-02-13 23:35 ` Mike Snitzer
  2 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2012-02-13 23:35 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Bottomley, Hannes Reinecke, Martin K. Petersen, dm-devel

Permanent target failures are non-retryable and should be classified as
TARGET_ERROR; otherwise dm-multipath will retry an IO request that will
always fail at the target.

A SCSI command that fails with ILLEGAL_REQUEST sense and Additional
sense 0x20, 0x21, 0x24 or 0x26 represents a permanent TARGET_ERROR.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/scsi/scsi_error.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

v2: revised patch header to not be hyper-focused on WRITE SAME failures

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index dc6131e..33d3a50 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -366,6 +366,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
 			return TARGET_ERROR;
 
 	case ILLEGAL_REQUEST:
+		if (sshdr.asc == 0x20 || /* Invalid command operation code */
+		    sshdr.asc == 0x21 || /* Logical block address out of range */
+		    sshdr.asc == 0x24 || /* Invalid field in cdb */
+		    sshdr.asc == 0x26) { /* Parameter value invalid */
+			return TARGET_ERROR;
+		}
+		return SUCCESS;
+
 	default:
 		return SUCCESS;
 	}


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-02-13 23:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-02 20:31 [PATCH] scsi_error: do not allow IO errors with certain ILLEGAL_REQUEST sense to be retryable Mike Snitzer
2011-12-02 21:04 ` James Bottomley
2011-12-02 22:04   ` Mike Snitzer
2011-12-06 21:07 ` [PATCH] " Martin K. Petersen
2011-12-06 21:27   ` Mike Snitzer
2011-12-06 22:03     ` Martin K. Petersen
2011-12-06 22:42       ` Mike Snitzer
2012-02-13 16:29         ` Mike Snitzer
2012-02-13 17:53           ` Martin K. Petersen
2012-02-13 18:13             ` Mike Snitzer
2012-02-13 18:59               ` Mike Snitzer
2012-02-13 19:16               ` Martin K. Petersen
2012-02-13 19:36                 ` Mike Snitzer
2012-02-13 19:38                   ` James Bottomley
2012-02-13 23:35 ` [PATCH v2] [SCSI] scsi_error: classify some ILLEGAL_REQUEST sense as a permanent TARGET_ERROR Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).