From: Mike Christie <michael.christie@oracle.com>
To: Niklas Cassel <niklas.cassel@wdc.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>,
linux-scsi@vger.kernel.org,
Damien Le Moal <damien.lemoal@opensource.wdc.com>
Subject: Re: [PATCH 23/25] scsi: sd: handle read/write CDL timeout failures
Date: Thu, 8 Dec 2022 18:13:16 -0600 [thread overview]
Message-ID: <2fe5212b-308a-54a2-cf44-9b2895132f74@oracle.com> (raw)
In-Reply-To: <20221208105947.2399894-24-niklas.cassel@wdc.com>
On 12/8/22 4:59 AM, Niklas Cassel wrote:
> Commands using a duration limit descriptor that has limit policies set
> to a value other than 0x0 may be failed by the device if one of the
> limits are exceeded. For such commands, since the failure is the result
> of the user duration limit configuration and workload, the commands
> should not be retried and terminated immediately. Furthermore, to allow
> the user to differentiate these "soft" failures from hard errors due to
> hardware problem, a different error code than EIO should be returned.
>
> There are 2 cases to consider:
> (1) The failure is due to a limit policy failing the command with a
> check condition sense key, that is, any limit policy other than 0xD.
> For this case, scsi_check_sense() is modified to detect failures with
> the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING
> or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING
> PROCESSING DUE TO ERROR RECOVERY additional sense code. For these
> failures, a SUCCESS disposition is returned so that
> scsi_finish_command() is called to terminate the command.
>
> (2) The failure is due to a limit policy set to 0xD, which result in the
> command being terminated with a GOOD status, COMPLETED sense key, and
> DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case,
> the scsi_check_sense() is modified to return a SUCCESS disposition so
> that scsi_finish_command() is called to terminate the command.
> In addition, scsi_decide_disposition() has to be modified to see if a
> command being terminated with GOOD status has sense data.
> This is as defined in SCSI Primary Commands - 6 (SPC-6), so all
> according to spec, even if GOOD status commands were not checked before.
>
> If scsi_check_sense() detects sense data representing a duration limit,
> scsi_check_sense() will set the newly introduced SCSI ML byte
> SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in
> scsi_noretry_cmd(), so that a command that failed because of a CDL
> timeout cannot be retried. The SCSI ML byte is also checked in
> scsi_result_to_blk_status() to complete the command request with the
> BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME
> errors for the failed commands.
>
> Co-developed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
> drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++
> drivers/scsi/scsi_lib.c | 4 ++++
> drivers/scsi/scsi_priv.h | 1 +
> 3 files changed, 51 insertions(+)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 51aa5c1e31b5..1bdab5385985 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status)
> */
> enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
> {
> + struct request *req = scsi_cmd_to_rq(scmd);
> struct scsi_device *sdev = scmd->device;
> struct scsi_sense_hdr sshdr;
>
> @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
> if (sshdr.asc == 0x10) /* DIF */
> return SUCCESS;
>
> + /*
> + * Check aborts due to command duration limit policy:
> + * ABORTED COMMAND additional sense code with the
> + * COMMAND TIMEOUT BEFORE PROCESSING or
> + * COMMAND TIMEOUT DURING PROCESSING or
> + * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY
> + * additional sense code qualifiers.
> + */
> + if (sshdr.asc == 0x2e &&
> + sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) {
> + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
> + req->cmd_flags |= REQ_FAILFAST_DEV;
Why are you setting the REQ_FAILFAST_DEV bit? Does libata check for it?
I thought you might have set it because DID_TIME_OUT was set and you wanted
to hit that check in scsi_noretry_cmd. However, I see that patch where you
added the new flag so DID_TIME_OUT does not get set sometimes so you probably
don't hit that path, and you have that check for SCSIML_STAT_DL_TIMEOUT in there
below.
> + req->rq_flags |= RQF_QUIET;
> + return SUCCESS;
> + }
> +
> if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
> return ADD_TO_MLQUEUE;
> if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
> @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
> }
> return SUCCESS;
>
> + case COMPLETED:
> + if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) {
> + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
> + req->cmd_flags |= REQ_FAILFAST_DEV;
> + req->rq_flags |= RQF_QUIET;
> + return SUCCESS;
> + }
> + return SUCCESS;
> +
> default:
> return SUCCESS;
> }
> @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd)
> switch (get_status_byte(scmd)) {
> case SAM_STAT_GOOD:
> scsi_handle_queue_ramp_up(scmd->device);
> + if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
> + /*
> + * If we have sense data, call scsi_check_sense() in
> + * order to set the correct SCSI ML byte (if any).
> + * No point in checking the return value, since the
> + * command has already completed successfully.
> + */
> + scsi_check_sense(scmd);
> fallthrough;
> case SAM_STAT_COMMAND_TERMINATED:
> return SUCCESS;
> @@ -1807,6 +1841,10 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd)
> return !!(req->cmd_flags & REQ_FAILFAST_DRIVER);
> }
>
> + /* Never retry commands aborted due to a duration limit timeout */
> + if (get_scsi_ml_byte(scmd->result) == SCSIML_STAT_DL_TIMEOUT)
> + return true;
> +
> if (!scsi_status_is_check_condition(scmd->result))
> return false;
>
> @@ -1966,6 +2004,14 @@ enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *scmd)
> if (scmd->cmnd[0] == REPORT_LUNS)
> scmd->device->sdev_target->expecting_lun_change = 0;
> scsi_handle_queue_ramp_up(scmd->device);
> + if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
> + /*
> + * If we have sense data, call scsi_check_sense() in
> + * order to set the correct SCSI ML byte (if any).
> + * No point in checking the return value, since the
> + * command has already completed successfully.
> + */
> + scsi_check_sense(scmd);
> fallthrough;
> case SAM_STAT_COMMAND_TERMINATED:
> return SUCCESS;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index e64fd8f495d7..4f317c6593aa 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -602,6 +602,8 @@ static blk_status_t scsi_result_to_blk_status(int result)
> return BLK_STS_MEDIUM;
> case SCSIML_STAT_TGT_FAILURE:
> return BLK_STS_TARGET;
> + case SCSIML_STAT_DL_TIMEOUT:
> + return BLK_STS_DURATION_LIMIT;
> }
>
> switch (host_byte(result)) {
> @@ -799,6 +801,8 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
> blk_stat = BLK_STS_ZONE_OPEN_RESOURCE;
> }
> break;
> + case COMPLETED:
> + fallthrough;
> default:
> action = ACTION_FAIL;
> break;
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 4f97e126c6fb..f8da92428ff6 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -27,6 +27,7 @@ enum scsi_ml_status {
> SCSIML_STAT_NOSPC = 0x02, /* Space allocation on the dev failed */
> SCSIML_STAT_MED_ERROR = 0x03, /* Medium error */
> SCSIML_STAT_TGT_FAILURE = 0x04, /* Permanent target failure */
> + SCSIML_STAT_DL_TIMEOUT = 0x05, /* Command Duration Limit timeout */
> };
>
> static inline u8 get_scsi_ml_byte(int result)
next prev parent reply other threads:[~2022-12-09 0:13 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-08 10:59 [PATCH 00/25] Add Command Duration Limits support Niklas Cassel
2022-12-08 10:59 ` [PATCH 01/25] ata: scsi: rename flag ATA_QCFLAG_FAILED to ATA_QCFLAG_EH Niklas Cassel
2022-12-21 11:47 ` John Garry
2022-12-08 10:59 ` [PATCH 02/25] ata: libata: move NCQ related ATA_DFLAGs Niklas Cassel
2022-12-08 10:59 ` [PATCH 03/25] ata: libata: simplify qc_fill_rtf port operation interface Niklas Cassel
2022-12-21 11:48 ` John Garry
2022-12-08 10:59 ` [PATCH 04/25] ata: libata: fix broken NCQ command status handling Niklas Cassel
2022-12-08 10:59 ` [PATCH 05/25] ata: libata: respect successfully completed commands during errors Niklas Cassel
2022-12-08 10:59 ` [PATCH 06/25] ata: libata: allow ata_scsi_set_sense() to not set CHECK_CONDITION Niklas Cassel
2022-12-08 10:59 ` [PATCH 07/25] ata: libata: allow ata_eh_request_sense() " Niklas Cassel
2022-12-08 10:59 ` [PATCH 08/25] ata: libata-scsi: do not overwrite SCSI ML and status bytes Niklas Cassel
2022-12-08 10:59 ` [PATCH 09/25] ata: libata-scsi: improve ata_scsiop_maint_in() Niklas Cassel
2022-12-08 10:59 ` [PATCH 10/25] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2022-12-08 10:59 ` [PATCH 11/25] scsi: move get_scsi_ml_byte() to scsi_priv.h Niklas Cassel
2022-12-08 23:58 ` Mike Christie
2022-12-28 20:41 ` Niklas Cassel
2022-12-29 18:55 ` Mike Christie
2022-12-29 20:19 ` Niklas Cassel
2022-12-08 10:59 ` [PATCH 12/25] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2022-12-08 10:59 ` [PATCH 13/25] scsi: support service action in scsi_report_opcode() Niklas Cassel
2022-12-08 10:59 ` [PATCH 14/25] block: introduce duration-limits priority class Niklas Cassel
2022-12-08 10:59 ` [PATCH 15/25] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2022-12-08 10:59 ` [PATCH 16/25] ata: libata: detect support for command duration limits Niklas Cassel
2022-12-08 10:59 ` [PATCH 17/25] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2022-12-08 10:59 ` [PATCH 18/25] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2022-12-08 10:59 ` [PATCH 19/25] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2022-12-08 10:59 ` [PATCH 20/25] ata: libata: set read/write commands CDL index Niklas Cassel
2022-12-08 10:59 ` [PATCH 21/25] scsi: sd: detect support for command duration limits Niklas Cassel
2022-12-08 10:59 ` [PATCH 22/25] scsi: sd: set read/write commands CDL index Niklas Cassel
2022-12-08 10:59 ` [PATCH 23/25] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2022-12-09 0:13 ` Mike Christie [this message]
2022-12-09 0:26 ` Damien Le Moal
2022-12-08 10:59 ` [PATCH 24/25] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2022-12-08 10:59 ` [PATCH 25/25] Documentation: sysfs-block-device: document command duration limits Niklas Cassel
2022-12-09 3:22 ` Bagas Sanjaya
2022-12-09 3:31 ` Damien Le Moal
2022-12-08 18:18 ` [PATCH 00/25] Add Command Duration Limits support Chaitanya Kulkarni
2022-12-09 0:29 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2fe5212b-308a-54a2-cf44-9b2895132f74@oracle.com \
--to=michael.christie@oracle.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=hare@suse.de \
--cc=jejb@linux.ibm.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=niklas.cassel@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox