Linux SCSI subsystem development
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: Niklas Cassel <niklas.cassel@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>,
	linux-scsi@vger.kernel.org,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>
Subject: Re: [PATCH 23/25] scsi: sd: handle read/write CDL timeout failures
Date: Thu, 8 Dec 2022 18:13:16 -0600	[thread overview]
Message-ID: <2fe5212b-308a-54a2-cf44-9b2895132f74@oracle.com> (raw)
In-Reply-To: <20221208105947.2399894-24-niklas.cassel@wdc.com>

On 12/8/22 4:59 AM, Niklas Cassel wrote:
> Commands using a duration limit descriptor that has limit policies set
> to a value other than 0x0 may be failed by the device if one of the
> limits are exceeded. For such commands, since the failure is the result
> of the user duration limit configuration and workload, the commands
> should not be retried and terminated immediately. Furthermore, to allow
> the user to differentiate these "soft" failures from hard errors due to
> hardware problem, a different error code than EIO should be returned.
> 
> There are 2 cases to consider:
> (1) The failure is due to a limit policy failing the command with a
> check condition sense key, that is, any limit policy other than 0xD.
> For this case, scsi_check_sense() is modified to detect failures with
> the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING
> or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING
> PROCESSING DUE TO ERROR RECOVERY additional sense code. For these
> failures, a SUCCESS disposition is returned so that
> scsi_finish_command() is called to terminate the command.
> 
> (2) The failure is due to a limit policy set to 0xD, which result in the
> command being terminated with a GOOD status, COMPLETED sense key, and
> DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case,
> the scsi_check_sense() is modified to return a SUCCESS disposition so
> that scsi_finish_command() is called to terminate the command.
> In addition, scsi_decide_disposition() has to be modified to see if a
> command being terminated with GOOD status has sense data.
> This is as defined in SCSI Primary Commands - 6 (SPC-6), so all
> according to spec, even if GOOD status commands were not checked before.
> 
> If scsi_check_sense() detects sense data representing a duration limit,
> scsi_check_sense() will set the newly introduced SCSI ML byte
> SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in
> scsi_noretry_cmd(), so that a command that failed because of a CDL
> timeout cannot be retried. The SCSI ML byte is also checked in
> scsi_result_to_blk_status() to complete the command request with the
> BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME
> errors for the failed commands.
> 
> Co-developed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++
>  drivers/scsi/scsi_lib.c   |  4 ++++
>  drivers/scsi/scsi_priv.h  |  1 +
>  3 files changed, 51 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 51aa5c1e31b5..1bdab5385985 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status)
>   */
>  enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>  {
> +	struct request *req = scsi_cmd_to_rq(scmd);
>  	struct scsi_device *sdev = scmd->device;
>  	struct scsi_sense_hdr sshdr;
>  
> @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>  		if (sshdr.asc == 0x10) /* DIF */
>  			return SUCCESS;
>  
> +		/*
> +		 * Check aborts due to command duration limit policy:
> +		 * ABORTED COMMAND additional sense code with the
> +		 * COMMAND TIMEOUT BEFORE PROCESSING or
> +		 * COMMAND TIMEOUT DURING PROCESSING or
> +		 * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY
> +		 * additional sense code qualifiers.
> +		 */
> +		if (sshdr.asc == 0x2e &&
> +		    sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) {
> +			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
> +			req->cmd_flags |= REQ_FAILFAST_DEV;

Why are you setting the REQ_FAILFAST_DEV bit? Does libata check for it?

I thought you might have set it because DID_TIME_OUT was set and you wanted
to hit that check in scsi_noretry_cmd. However, I see that patch where you
added the new flag so DID_TIME_OUT does not get set sometimes so you probably
don't hit that path, and you have that check for SCSIML_STAT_DL_TIMEOUT in there
below.



> +			req->rq_flags |= RQF_QUIET;
> +			return SUCCESS;
> +		}
> +
>  		if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
>  			return ADD_TO_MLQUEUE;
>  		if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
> @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>  		}
>  		return SUCCESS;
>  
> +	case COMPLETED:
> +		if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) {
> +			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
> +			req->cmd_flags |= REQ_FAILFAST_DEV;
> +			req->rq_flags |= RQF_QUIET;
> +			return SUCCESS;
> +		}
> +		return SUCCESS;
> +
>  	default:
>  		return SUCCESS;
>  	}
> @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd)
>  	switch (get_status_byte(scmd)) {
>  	case SAM_STAT_GOOD:
>  		scsi_handle_queue_ramp_up(scmd->device);
> +		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
> +			/*
> +			 * If we have sense data, call scsi_check_sense() in
> +			 * order to set the correct SCSI ML byte (if any).
> +			 * No point in checking the return value, since the
> +			 * command has already completed successfully.
> +			 */
> +			scsi_check_sense(scmd);
>  		fallthrough;
>  	case SAM_STAT_COMMAND_TERMINATED:
>  		return SUCCESS;
> @@ -1807,6 +1841,10 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd)
>  		return !!(req->cmd_flags & REQ_FAILFAST_DRIVER);
>  	}
>  
> +	/* Never retry commands aborted due to a duration limit timeout */
> +	if (get_scsi_ml_byte(scmd->result) == SCSIML_STAT_DL_TIMEOUT)
> +		return true;
> +
>  	if (!scsi_status_is_check_condition(scmd->result))
>  		return false;
>  
> @@ -1966,6 +2004,14 @@ enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *scmd)
>  		if (scmd->cmnd[0] == REPORT_LUNS)
>  			scmd->device->sdev_target->expecting_lun_change = 0;
>  		scsi_handle_queue_ramp_up(scmd->device);
> +		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
> +			/*
> +			 * If we have sense data, call scsi_check_sense() in
> +			 * order to set the correct SCSI ML byte (if any).
> +			 * No point in checking the return value, since the
> +			 * command has already completed successfully.
> +			 */
> +			scsi_check_sense(scmd);
>  		fallthrough;
>  	case SAM_STAT_COMMAND_TERMINATED:
>  		return SUCCESS;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index e64fd8f495d7..4f317c6593aa 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -602,6 +602,8 @@ static blk_status_t scsi_result_to_blk_status(int result)
>  		return BLK_STS_MEDIUM;
>  	case SCSIML_STAT_TGT_FAILURE:
>  		return BLK_STS_TARGET;
> +	case SCSIML_STAT_DL_TIMEOUT:
> +		return BLK_STS_DURATION_LIMIT;
>  	}
>  
>  	switch (host_byte(result)) {
> @@ -799,6 +801,8 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
>  				blk_stat = BLK_STS_ZONE_OPEN_RESOURCE;
>  			}
>  			break;
> +		case COMPLETED:
> +			fallthrough;
>  		default:
>  			action = ACTION_FAIL;
>  			break;
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 4f97e126c6fb..f8da92428ff6 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -27,6 +27,7 @@ enum scsi_ml_status {
>  	SCSIML_STAT_NOSPC		= 0x02,	/* Space allocation on the dev failed */
>  	SCSIML_STAT_MED_ERROR		= 0x03,	/* Medium error */
>  	SCSIML_STAT_TGT_FAILURE		= 0x04,	/* Permanent target failure */
> +	SCSIML_STAT_DL_TIMEOUT		= 0x05, /* Command Duration Limit timeout */
>  };
>  
>  static inline u8 get_scsi_ml_byte(int result)


  reply	other threads:[~2022-12-09  0:13 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-08 10:59 [PATCH 00/25] Add Command Duration Limits support Niklas Cassel
2022-12-08 10:59 ` [PATCH 01/25] ata: scsi: rename flag ATA_QCFLAG_FAILED to ATA_QCFLAG_EH Niklas Cassel
2022-12-21 11:47   ` John Garry
2022-12-08 10:59 ` [PATCH 02/25] ata: libata: move NCQ related ATA_DFLAGs Niklas Cassel
2022-12-08 10:59 ` [PATCH 03/25] ata: libata: simplify qc_fill_rtf port operation interface Niklas Cassel
2022-12-21 11:48   ` John Garry
2022-12-08 10:59 ` [PATCH 04/25] ata: libata: fix broken NCQ command status handling Niklas Cassel
2022-12-08 10:59 ` [PATCH 05/25] ata: libata: respect successfully completed commands during errors Niklas Cassel
2022-12-08 10:59 ` [PATCH 06/25] ata: libata: allow ata_scsi_set_sense() to not set CHECK_CONDITION Niklas Cassel
2022-12-08 10:59 ` [PATCH 07/25] ata: libata: allow ata_eh_request_sense() " Niklas Cassel
2022-12-08 10:59 ` [PATCH 08/25] ata: libata-scsi: do not overwrite SCSI ML and status bytes Niklas Cassel
2022-12-08 10:59 ` [PATCH 09/25] ata: libata-scsi: improve ata_scsiop_maint_in() Niklas Cassel
2022-12-08 10:59 ` [PATCH 10/25] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2022-12-08 10:59 ` [PATCH 11/25] scsi: move get_scsi_ml_byte() to scsi_priv.h Niklas Cassel
2022-12-08 23:58   ` Mike Christie
2022-12-28 20:41     ` Niklas Cassel
2022-12-29 18:55       ` Mike Christie
2022-12-29 20:19         ` Niklas Cassel
2022-12-08 10:59 ` [PATCH 12/25] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2022-12-08 10:59 ` [PATCH 13/25] scsi: support service action in scsi_report_opcode() Niklas Cassel
2022-12-08 10:59 ` [PATCH 14/25] block: introduce duration-limits priority class Niklas Cassel
2022-12-08 10:59 ` [PATCH 15/25] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2022-12-08 10:59 ` [PATCH 16/25] ata: libata: detect support for command duration limits Niklas Cassel
2022-12-08 10:59 ` [PATCH 17/25] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2022-12-08 10:59 ` [PATCH 18/25] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2022-12-08 10:59 ` [PATCH 19/25] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2022-12-08 10:59 ` [PATCH 20/25] ata: libata: set read/write commands CDL index Niklas Cassel
2022-12-08 10:59 ` [PATCH 21/25] scsi: sd: detect support for command duration limits Niklas Cassel
2022-12-08 10:59 ` [PATCH 22/25] scsi: sd: set read/write commands CDL index Niklas Cassel
2022-12-08 10:59 ` [PATCH 23/25] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2022-12-09  0:13   ` Mike Christie [this message]
2022-12-09  0:26     ` Damien Le Moal
2022-12-08 10:59 ` [PATCH 24/25] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2022-12-08 10:59 ` [PATCH 25/25] Documentation: sysfs-block-device: document command duration limits Niklas Cassel
2022-12-09  3:22   ` Bagas Sanjaya
2022-12-09  3:31     ` Damien Le Moal
2022-12-08 18:18 ` [PATCH 00/25] Add Command Duration Limits support Chaitanya Kulkarni
2022-12-09  0:29   ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fe5212b-308a-54a2-cf44-9b2895132f74@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=hare@suse.de \
    --cc=jejb@linux.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=niklas.cassel@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox