[PATCH v12 00/20] scsi: Allow scsi_execute users to request retries

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries
@ 2023-11-14  1:37 Mike Christie
  2023-11-14  1:37 ` [PATCH v12 01/20] scsi: Allow passthrough to request scsi-ml retries Mike Christie
                   ` (20 more replies)
  0 siblings, 21 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley

The following patches were made over Martin's 6.7 staging branch
which contains a sd change that this patch requires.

The patches allow scsi_execute_cmd users to have scsi-ml retry the
cmd for it instead of the caller having to parse the error and loop
itself.

Note that I dropped most reviewed-by tags, because the structs changed
where before we only had struct scsi_failure we now have the struct
scsi_failure that is in:

struct scsi_failures {
        /*
         * If the failure does not have a specific limit in the scsi_failure
         * then this limit is followed.
         */
        int total_allowed;
        int total_retries;
        struct scsi_failure *failure_definitions;
};

so we can limit the total number of retries. The setup for this is
then different and I'm not sure if we everyone will like it. The
other parts of the conversions patches have not changed.


 drivers/scsi/Kconfig                        |   9 +
 drivers/scsi/ch.c                           |  27 ++-
 drivers/scsi/device_handler/scsi_dh_hp_sw.c |  49 +++--
 drivers/scsi/device_handler/scsi_dh_rdac.c  |  84 +++----
 drivers/scsi/scsi_lib.c                     |  26 ++-
 drivers/scsi/scsi_lib_test.c                | 330 ++++++++++++++++++++++++++++
 drivers/scsi/scsi_scan.c                    | 107 +++++----
 drivers/scsi/scsi_transport_spi.c           |  35 +--
 drivers/scsi/sd.c                           | 326 +++++++++++++++++----------
 drivers/scsi/ses.c                          |  66 ++++--
 drivers/scsi/sr.c                           |  38 ++--
 drivers/ufs/core/ufshcd.c                   |  22 +-
 12 files changed, 822 insertions(+), 297 deletions(-)

v12:
- Fix bug where a user could request no retries and skip going
through the scsi-eh.
- Drop support for allowing caller to have scsi-ml not retry
failures (we only allow caller to request retries).
- Fix formatting.
- Add support to control total number of retries.
- Fix kunit tests to add a missing test and comments.
- Fix missing SCMD_FAILURE_ASCQ_ANY in sd_spinup_disk.

v11:
- Document scsi_failure.result special values
- Fix sshdr fix git commit message where there was a missing word
- Use designated initializers for cdb setup
- Fix up various coding style comments from John like redoing if/else
error/success checks.
- Add patch to fix rdac issue where stale SCSH_DH values were returned
- Remove old comment from:
"[PATCH v10 16/33] scsi: spi: Have scsi-ml retry spi_execute errors"
- Drop EOPNOTSUPP use from:
"[PATCH v10 17/33] scsi: sd: Fix sshdr use in sd_suspend_common"
- Init errno to 0 when declared in:
"[PATCH v10 20/33] scsi: ch: Have scsi-ml retry ch_do_scsi errors"
- Add diffstat below

v10:
- Drop "," after {}.
- Hopefully fix outlook issues.

v9:
- Drop spi_execute changes from [PATCH] scsi: spi: Fix sshdr use
- Change git commit message for sshdr use fixes.

v8:
- Rebase.
- Saw the discussion about the possible bug where callers are
accessing the sshdr when it's not setup, so I added some patches
for that since I was going over the same code.

v7:
- Rebase against scsi_execute_cmd patchset.

v6:
- Fix kunit build when it's built as a module but scsi is not.
- Drop [PATCH 17/35] scsi: ufshcd: Convert to scsi_exec_req because
  that driver no longer uses scsi_execute.
- Convert ufshcd to use the scsi_failures struct because it now just does
  direct retries and does not do it's own deadline timing.
- Add back memset in read_capacity_16.
- Remove memset in get_sectorsize and replace with { } to init buf.

v5:
- Fix spelling (made sure I ran checkpatch strict)
- Drop SCMD_FAILURE_NONE
- Rename SCMD_FAILURE_ANY
- Fix media_not_present handling where it was being retried instead of
failed.
- Fix ILLEGAL_REQUEST handling in read_capacity_16 so it was not retried.
- Fix coding style, spelling and and naming convention in kunit and added
  more tests to handle cases like the media_not_present one where we want
  to force failures instead of retries.
- Drop cxlflash patch because it actually checked it's internal state before
  performing a retry which we currently do not support.

v4:
- Redefine cmd definitions if the cmd is touched.
- Fix up coding style issues.
- Use sam_status enum.
- Move failures initialization to scsi_initialize_rq
(also fixes KASAN error).
- Add kunit test.
- Add function comments.

v3:
- Use a for loop in scsi_check_passthrough
- Fix result handling/testing.
- Fix scsi_status_is_good handling.
- make __scsi_exec_req take a const arg
- Fix formatting in patch 24

v2:
- Rename scsi_prep_sense
- Change scsi_check_passthrough's loop and added some fixes
- Modified scsi_execute* so it uses a struct to pass in args




^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v12 01/20] scsi: Allow passthrough to request scsi-ml retries
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-16 10:53   ` John Garry
  2023-11-14  1:37 ` [PATCH v12 02/20] scsi: Have scsi-ml retry scsi_probe_lun errors Mike Christie
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

For passthrough we don't retry any error we get a check condition for.
This results in a lot of callers driving their own retries for all UAs,
specific UAs, NOT_READY, specific sense values or any type of failure.

This adds the core code to allow passthrough users to specify what errors
they want scsi-ml to retry for them. We can then convert users to drop
a lot of their sense parsing and retry handling.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_lib.c    | 92 ++++++++++++++++++++++++++++++++++++++
 include/scsi/scsi_device.h | 48 ++++++++++++++++++++
 2 files changed, 140 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index cf3864f72093..dee43c6f7ad0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -184,6 +184,92 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 	__scsi_queue_insert(cmd, reason, true);
 }
 
+void scsi_reset_failures(struct scsi_failures *failures)
+{
+	struct scsi_failure *failure;
+
+	failures->total_retries = 0;
+
+	for (failure = failures->failure_definitions; failure->result;
+	     failure++)
+		failure->retries = 0;
+}
+EXPORT_SYMBOL_GPL(scsi_reset_failures);
+
+/**
+ * scsi_check_passthrough - Determine if passthrough scsi_cmnd needs a retry.
+ * @scmd: scsi_cmnd to check.
+ * @failures: scsi_failures struct that lists failures to check for.
+ *
+ * Returns -EAGAIN if the caller should retry else 0.
+ */
+static int scsi_check_passthrough(struct scsi_cmnd *scmd,
+				  struct scsi_failures *failures)
+{
+	struct scsi_failure *failure;
+	struct scsi_sense_hdr sshdr;
+	enum sam_status status;
+
+	if (!failures)
+		return 0;
+
+	for (failure = failures->failure_definitions; failure->result;
+	     failure++) {
+		if (failure->result == SCMD_FAILURE_RESULT_ANY)
+			goto maybe_retry;
+
+		if (host_byte(scmd->result) &&
+		    host_byte(scmd->result) == host_byte(failure->result))
+			goto maybe_retry;
+
+		status = status_byte(scmd->result);
+		if (!status)
+			continue;
+
+		if (failure->result == SCMD_FAILURE_STAT_ANY &&
+		    !scsi_status_is_good(scmd->result))
+			goto maybe_retry;
+
+		if (status != status_byte(failure->result))
+			continue;
+
+		if (status_byte(failure->result) != SAM_STAT_CHECK_CONDITION ||
+		    failure->sense == SCMD_FAILURE_SENSE_ANY)
+			goto maybe_retry;
+
+		if (!scsi_command_normalize_sense(scmd, &sshdr))
+			return 0;
+
+		if (failure->sense != sshdr.sense_key)
+			continue;
+
+		if (failure->asc == SCMD_FAILURE_ASC_ANY)
+			goto maybe_retry;
+
+		if (failure->asc != sshdr.asc)
+			continue;
+
+		if (failure->ascq == SCMD_FAILURE_ASCQ_ANY ||
+		    failure->ascq == sshdr.ascq)
+			goto maybe_retry;
+	}
+
+	return 0;
+
+maybe_retry:
+	if (failure->allowed) {
+		if (failure->allowed == SCMD_FAILURE_NO_LIMIT ||
+		    ++failure->retries <= failure->allowed)
+			return -EAGAIN;
+	} else {
+		if (failures->total_allowed == SCMD_FAILURE_NO_LIMIT ||
+		    ++failures->total_retries <= failures->total_allowed)
+			return -EAGAIN;
+	}
+
+	return 0;
+}
+
 /**
  * scsi_execute_cmd - insert request and wait for the result
  * @sdev:	scsi_device
@@ -214,6 +300,7 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
 			      args->sense_len != SCSI_SENSE_BUFFERSIZE))
 		return -EINVAL;
 
+retry:
 	req = scsi_alloc_request(sdev->request_queue, opf, args->req_flags);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
@@ -237,6 +324,11 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
 	 */
 	blk_execute_rq(req, true);
 
+	if (scsi_check_passthrough(scmd, args->failures) == -EAGAIN) {
+		blk_mq_free_request(req);
+		goto retry;
+	}
+
 	/*
 	 * Some devices (USB mass-storage in particular) may transfer
 	 * garbage data together with a residue indicating that the data
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 10480eb582b2..c92d6d9e644e 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -483,6 +483,52 @@ extern int scsi_is_sdev_device(const struct device *);
 extern int scsi_is_target_device(const struct device *);
 extern void scsi_sanitize_inquiry_string(unsigned char *s, int len);
 
+/*
+ * scsi_execute_cmd users can set scsi_failure.result to have
+ * scsi_check_passthrough fail/retry a command. scsi_failure.result can be a
+ * specific host byte or message code, or SCMD_FAILURE_RESULT_ANY can be used
+ * to match any host or message code.
+ */
+#define SCMD_FAILURE_RESULT_ANY	0x7fffffff
+/*
+ * Set scsi_failure.result to SCMD_FAILURE_STAT_ANY to fail/retry any failure
+ * scsi_status_is_good returns false for.
+ */
+#define SCMD_FAILURE_STAT_ANY	0xff
+/*
+ * The following can be set to the scsi_failure sense, asc and ascq fields to
+ * match on any sense, ASC, or ASCQ value.
+ */
+#define SCMD_FAILURE_SENSE_ANY	0xff
+#define SCMD_FAILURE_ASC_ANY	0xff
+#define SCMD_FAILURE_ASCQ_ANY	0xff
+/* Always retry a matching failure. */
+#define SCMD_FAILURE_NO_LIMIT	-1
+
+struct scsi_failure {
+	int result;
+	u8 sense;
+	u8 asc;
+	u8 ascq;
+
+	/*
+	 * Number of attempts allowed for this failure. It does not count
+	 * for the total_allowed.
+	 */
+	s8 allowed;
+	s8 retries;
+};
+
+struct scsi_failures {
+	/*
+	 * If the failure does not have a specific limit in the scsi_failure
+	 * then this limit is followed.
+	 */
+	int total_allowed;
+	int total_retries;
+	struct scsi_failure *failure_definitions;
+};
+
 /* Optional arguments to scsi_execute_cmd */
 struct scsi_exec_args {
 	unsigned char *sense;		/* sense buffer */
@@ -491,12 +537,14 @@ struct scsi_exec_args {
 	blk_mq_req_flags_t req_flags;	/* BLK_MQ_REQ flags */
 	int scmd_flags;			/* SCMD flags */
 	int *resid;			/* residual length */
+	struct scsi_failures *failures;	/* failures to retry */
 };
 
 int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
 		     blk_opf_t opf, void *buffer, unsigned int bufflen,
 		     int timeout, int retries,
 		     const struct scsi_exec_args *args);
+void scsi_reset_failures(struct scsi_failures *failures);
 
 extern void sdev_disable_disk_events(struct scsi_device *sdev);
 extern void sdev_enable_disk_events(struct scsi_device *sdev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 02/20] scsi: Have scsi-ml retry scsi_probe_lun errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
  2023-11-14  1:37 ` [PATCH v12 01/20] scsi: Allow passthrough to request scsi-ml retries Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-16 11:14   ` John Garry
  2023-11-14  1:37 ` [PATCH v12 03/20] scsi: retry INQUIRY after timeout Mike Christie
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has scsi_probe_lun ask scsi-ml to retry UAs instead of driving them
itself.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_scan.c | 46 ++++++++++++++++++++++++----------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 44680f65ea14..eeb53c28581f 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -626,6 +626,7 @@ void scsi_sanitize_inquiry_string(unsigned char *s, int len)
 }
 EXPORT_SYMBOL(scsi_sanitize_inquiry_string);
 
+
 /**
  * scsi_probe_lun - probe a single LUN using a SCSI INQUIRY
  * @sdev:	scsi_device to probe
@@ -647,10 +648,32 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	int first_inquiry_len, try_inquiry_len, next_inquiry_len;
 	int response_len = 0;
 	int pass, count, result, resid;
-	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		/*
+		 * not-ready to ready transition [asc/ascq=0x28/0x0] or
+		 * power-on, reset [asc/ascq=0x29/0x0], continue. INQUIRY
+		 * should not yield UNIT_ATTENTION but many buggy devices do
+		 * so anyway.
+		 */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x28,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.total_allowed = 3,
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
-		.sshdr = &sshdr,
 		.resid = &resid,
+		.failures = &failures,
 	};
 
 	*bflags = 0;
@@ -668,6 +691,8 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 				pass, try_inquiry_len));
 
 	/* Each pass gets up to three chances to ignore Unit Attention */
+	scsi_reset_failures(&failures);
+
 	for (count = 0; count < 3; ++count) {
 		memset(scsi_cmd, 0, 6);
 		scsi_cmd[0] = INQUIRY;
@@ -684,22 +709,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 				"scsi scan: INQUIRY %s with code 0x%x\n",
 				result ? "failed" : "successful", result));
 
-		if (result > 0) {
-			/*
-			 * not-ready to ready transition [asc/ascq=0x28/0x0]
-			 * or power-on, reset [asc/ascq=0x29/0x0], continue.
-			 * INQUIRY should not yield UNIT_ATTENTION
-			 * but many buggy devices do so anyway. 
-			 */
-			if (scsi_status_is_check_condition(result) &&
-			    scsi_sense_valid(&sshdr)) {
-				if ((sshdr.sense_key == UNIT_ATTENTION) &&
-				    ((sshdr.asc == 0x28) ||
-				     (sshdr.asc == 0x29)) &&
-				    (sshdr.ascq == 0))
-					continue;
-			}
-		} else if (result == 0) {
+		if (result == 0) {
 			/*
 			 * if nothing was transferred, we try
 			 * again. It's a workaround for some USB
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 03/20] scsi: retry INQUIRY after timeout
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
  2023-11-14  1:37 ` [PATCH v12 01/20] scsi: Allow passthrough to request scsi-ml retries Mike Christie
  2023-11-14  1:37 ` [PATCH v12 02/20] scsi: Have scsi-ml retry scsi_probe_lun errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 04/20] scsi: sd: Have scsi-ml retry read_capacity_16 errors Mike Christie
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

Description from: Martin Wilck <mwilck@suse.com>:

The SCSI mid layer doesn't retry commands after DID_TIME_OUT (see
scsi_noretry_cmd()). Packet loss in the fabric can cause spurious timeouts
during SCSI device probing, causing device probing to fail. This has been
observed in FCoE uplink failover tests, for example.

This patch fixes the issue by retrying the INQUIRY.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/scsi_scan.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index eeb53c28581f..8da0990b2dde 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -665,6 +665,10 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 			.asc = 0x29,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
+		{
+			.allowed = 1,
+			.result = DID_TIME_OUT << 16,
+		},
 		{}
 	};
 	struct scsi_failures failures = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 04/20] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (2 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 03/20] scsi: retry INQUIRY after timeout Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-16 11:39   ` John Garry
  2023-11-14  1:37 ` [PATCH v12 05/20] scsi: Use separate buf for START_STOP in sd_spinup_disk Mike Christie
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has read_capacity_16 have scsi-ml retry errors instead of driving
them itself.

There are 2 behavior changes with this patch:
1. There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs since the block layer waits/retries for us. For possible
memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
retrying will probably not help.
2. For the specific UAs we checked for and retried, we would get
READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
from the main loop's retries. Each UA now gets
READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
is already 10 and is not based on anything specific like a spec or
device, so the extra 3 we got from the main loop was probably just an
accident and is not going to help.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c | 108 ++++++++++++++++++++++++++++++----------------
 1 file changed, 71 insertions(+), 37 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fa00dd503cbf..1af04b01e1df 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2453,55 +2453,89 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
 static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 						unsigned char *buffer)
 {
-	unsigned char cmd[16];
-	struct scsi_sense_hdr sshdr;
-	const struct scsi_exec_args exec_args = {
-		.sshdr = &sshdr,
+	static const u8 cmd[16] = {
+		[0] = SERVICE_ACTION_IN_16,
+		[1] = SAI_READ_CAPACITY_16,
+		[13] = RC16_LEN,
 	};
+	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	int the_result;
-	int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
 	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
+	struct scsi_failure failure_defs[] = {
+		/*
+		 * Do not retry Invalid Command Operation Code or Invalid
+		 * Field in CDB.
+		 */
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x20,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x24,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Do not retry Medium Not Present */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x3A,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = 0x3A,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Device reset might occur several times so retry a lot */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Any other error not listed above retry 3 times */
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+			.allowed = 3,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
+	const struct scsi_exec_args exec_args = {
+		.sshdr = &sshdr,
+		.failures = &failures,
+	};
 
 	if (sdp->no_read_capacity_16)
 		return -EINVAL;
 
-	do {
-		memset(cmd, 0, 16);
-		cmd[0] = SERVICE_ACTION_IN_16;
-		cmd[1] = SAI_READ_CAPACITY_16;
-		cmd[13] = RC16_LEN;
-		memset(buffer, 0, RC16_LEN);
-
-		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
-					      buffer, RC16_LEN, SD_TIMEOUT,
-					      sdkp->max_retries, &exec_args);
-		if (the_result > 0) {
-			if (media_not_present(sdkp, &sshdr))
-				return -ENODEV;
+	memset(buffer, 0, RC16_LEN);
 
-			sense_valid = scsi_sense_valid(&sshdr);
-			if (sense_valid &&
-			    sshdr.sense_key == ILLEGAL_REQUEST &&
-			    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
-			    sshdr.ascq == 0x00)
-				/* Invalid Command Operation Code or
-				 * Invalid Field in CDB, just retry
-				 * silently with RC10 */
-				return -EINVAL;
-			if (sense_valid &&
-			    sshdr.sense_key == UNIT_ATTENTION &&
-			    sshdr.asc == 0x29 && sshdr.ascq == 0x00)
-				/* Device reset might occur several times,
-				 * give it one more chance */
-				if (--reset_retries > 0)
-					continue;
-		}
-		retries--;
+	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
+				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
+				      &exec_args);
 
-	} while (the_result && retries);
+	if (the_result > 0) {
+		if (media_not_present(sdkp, &sshdr))
+			return -ENODEV;
+
+		sense_valid = scsi_sense_valid(&sshdr);
+		if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST &&
+		    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
+		     sshdr.ascq == 0x00) {
+			/*
+			 * Invalid Command Operation Code or Invalid Field in
+			 * CDB, just retry silently with RC10
+			 */
+			return -EINVAL;
+		}
+	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 05/20] scsi: Use separate buf for START_STOP in sd_spinup_disk
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (3 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 04/20] scsi: sd: Have scsi-ml retry read_capacity_16 errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 06/20] scsi: Have scsi-ml retry sd_spinup_disk errors Mike Christie
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

We currently re-use the cmd buffer for the TUR and START_STOP commands
which requires us to reset the buffer when retrying. This has us use
separate buffers for the 2 commands so we can make them const and I think
it makes it easier to handle for retries but does not add too much extra
to the stack use.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/sd.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 1af04b01e1df..641f9c9c0674 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2318,14 +2318,16 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 			 * Issue command to spin up drive when not ready
 			 */
 			if (!spintime) {
+				/* Return immediately and start spin cycle */
+				const u8 start_cmd[10] = {
+					[0] = START_STOP,
+					[1] = 1,
+					[4] = sdkp->device->start_stop_pwr_cond ?
+						0x11 : 1,
+				};
+
 				sd_printk(KERN_NOTICE, sdkp, "Spinning up disk...");
-				cmd[0] = START_STOP;
-				cmd[1] = 1;	/* Return immediately */
-				memset((void *) &cmd[2], 0, 8);
-				cmd[4] = 1;	/* Start spin cycle */
-				if (sdkp->device->start_stop_pwr_cond)
-					cmd[4] |= 1 << 4;
-				scsi_execute_cmd(sdkp->device, cmd,
+				scsi_execute_cmd(sdkp->device, start_cmd,
 						 REQ_OP_DRV_IN, NULL, 0,
 						 SD_TIMEOUT, sdkp->max_retries,
 						 &exec_args);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 06/20] scsi: Have scsi-ml retry sd_spinup_disk errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (4 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 05/20] scsi: Use separate buf for START_STOP in sd_spinup_disk Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-16 12:13   ` John Garry
  2023-11-14  1:37 ` [PATCH v12 07/20] scsi: hp_sw: Have scsi-ml retry scsi_execute_cmd errors Mike Christie
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This simplifies sd_spinup_disk so scsi-ml retries errors for it. Note that
we retried specifically on a UA and also if scsi_status_is_good returned
failed which would happen for all check conditions. In this patch we use
SCMD_FAILURE_STAT_ANY which will trigger for the same conditions as
when scsi_status_is_good returns false and there is status. This will
cover all CCs including UAs so there is no explicit failures arrary entry
for UAs.

There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs the block layer waits/retries for us. For possible memory
allocation failures from blk_rq_map_kern we use GFP_NOIO, so retrying
will probably not help.

We do not handle the outside loop's retries because we want to sleep
between tries and we don't support that yet.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c | 77 +++++++++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 641f9c9c0674..cda0d029ab7f 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2235,55 +2235,68 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 static void
 sd_spinup_disk(struct scsi_disk *sdkp)
 {
-	unsigned char cmd[10];
+	static const u8 cmd[10] = { TEST_UNIT_READY };
 	unsigned long spintime_expire = 0;
-	int retries, spintime;
+	int spintime, sense_valid = 0;
 	unsigned int the_result;
 	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		/* Do not retry Medium Not Present */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Retry when scsi_status_is_good would return false 3 times */
+		{
+			.result = SCMD_FAILURE_STAT_ANY,
+			.allowed = 3,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
-	int sense_valid = 0;
 
 	spintime = 0;
 
 	/* Spin up drives, as required.  Only do this at boot time */
 	/* Spinup needs to be done for module loads too. */
 	do {
-		retries = 0;
+		bool media_was_present = sdkp->media_present;
 
-		do {
-			bool media_was_present = sdkp->media_present;
+		scsi_reset_failures(&failures);
 
-			cmd[0] = TEST_UNIT_READY;
-			memset((void *) &cmd[1], 0, 9);
+		the_result = scsi_execute_cmd(sdkp->device, cmd, REQ_OP_DRV_IN,
+					      NULL, 0, SD_TIMEOUT,
+					      sdkp->max_retries, &exec_args);
 
-			the_result = scsi_execute_cmd(sdkp->device, cmd,
-						      REQ_OP_DRV_IN, NULL, 0,
-						      SD_TIMEOUT,
-						      sdkp->max_retries,
-						      &exec_args);
 
-			if (the_result > 0) {
-				/*
-				 * If the drive has indicated to us that it
-				 * doesn't have any media in it, don't bother
-				 * with any more polling.
-				 */
-				if (media_not_present(sdkp, &sshdr)) {
-					if (media_was_present)
-						sd_printk(KERN_NOTICE, sdkp,
-							  "Media removed, stopped polling\n");
-					return;
-				}
-
-				sense_valid = scsi_sense_valid(&sshdr);
+		if (the_result > 0) {
+			/*
+			 * If the drive has indicated to us that it doesn't
+			 * have any media in it, don't bother with any more
+			 * polling.
+			 */
+			if (media_not_present(sdkp, &sshdr)) {
+				if (media_was_present)
+					sd_printk(KERN_NOTICE, sdkp,
+						  "Media removed, stopped polling\n");
+				return;
 			}
-			retries++;
-		} while (retries < 3 &&
-			 (!scsi_status_is_good(the_result) ||
-			  (scsi_status_is_check_condition(the_result) &&
-			  sense_valid && sshdr.sense_key == UNIT_ATTENTION)));
+			sense_valid = scsi_sense_valid(&sshdr);
+		}
 
 		if (!scsi_status_is_check_condition(the_result)) {
 			/* no sense, TUR either succeeded or failed
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 07/20] scsi: hp_sw: Have scsi-ml retry scsi_execute_cmd errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (5 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 06/20] scsi: Have scsi-ml retry sd_spinup_disk errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 08/20] scsi: rdac: Have scsi-ml retry send_mode_select errors Mike Christie
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has hp_sw have scsi-ml retry scsi_execute_cmd errors instead of
driving them itself.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/device_handler/scsi_dh_hp_sw.c | 49 ++++++++++++++-------
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_hp_sw.c b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
index 944ea4e0cc45..b6eaf49dfb00 100644
--- a/drivers/scsi/device_handler/scsi_dh_hp_sw.c
+++ b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
@@ -46,9 +46,6 @@ static int tur_done(struct scsi_device *sdev, struct hp_sw_dh_data *h,
 	int ret = SCSI_DH_IO;
 
 	switch (sshdr->sense_key) {
-	case UNIT_ATTENTION:
-		ret = SCSI_DH_IMM_RETRY;
-		break;
 	case NOT_READY:
 		if (sshdr->asc == 0x04 && sshdr->ascq == 2) {
 			/*
@@ -85,11 +82,24 @@ static int hp_sw_tur(struct scsi_device *sdev, struct hp_sw_dh_data *h)
 	int ret, res;
 	blk_opf_t opf = REQ_OP_DRV_IN | REQ_FAILFAST_DEV |
 				REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER;
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = SCMD_FAILURE_NO_LIMIT,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 
-retry:
 	res = scsi_execute_cmd(sdev, cmd, opf, NULL, 0, HP_SW_TIMEOUT,
 			       HP_SW_RETRIES, &exec_args);
 	if (res > 0 && scsi_sense_valid(&sshdr)) {
@@ -104,9 +114,6 @@ static int hp_sw_tur(struct scsi_device *sdev, struct hp_sw_dh_data *h)
 		ret = SCSI_DH_IO;
 	}
 
-	if (ret == SCSI_DH_IMM_RETRY)
-		goto retry;
-
 	return ret;
 }
 
@@ -122,14 +129,31 @@ static int hp_sw_start_stop(struct hp_sw_dh_data *h)
 	struct scsi_sense_hdr sshdr;
 	struct scsi_device *sdev = h->sdev;
 	int res, rc;
-	int retry_cnt = HP_SW_RETRIES;
 	blk_opf_t opf = REQ_OP_DRV_IN | REQ_FAILFAST_DEV |
 				REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER;
+	struct scsi_failure failure_defs[] = {
+		{
+			/*
+			 * LUN not ready - manual intervention required
+			 *
+			 * Switch-over in progress, retry.
+			 */
+			.sense = NOT_READY,
+			.asc = 0x04,
+			.ascq = 0x03,
+			.allowed = HP_SW_RETRIES,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 
-retry:
 	res = scsi_execute_cmd(sdev, cmd, opf, NULL, 0, HP_SW_TIMEOUT,
 			       HP_SW_RETRIES, &exec_args);
 	if (!res) {
@@ -144,13 +168,6 @@ static int hp_sw_start_stop(struct hp_sw_dh_data *h)
 	switch (sshdr.sense_key) {
 	case NOT_READY:
 		if (sshdr.asc == 0x04 && sshdr.ascq == 3) {
-			/*
-			 * LUN not ready - manual intervention required
-			 *
-			 * Switch-over in progress, retry.
-			 */
-			if (--retry_cnt)
-				goto retry;
 			rc = SCSI_DH_RETRY;
 			break;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 08/20] scsi: rdac: Have scsi-ml retry send_mode_select errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (6 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 07/20] scsi: hp_sw: Have scsi-ml retry scsi_execute_cmd errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 09/20] scsi: spi: Have scsi-ml retry spi_execute UAs Mike Christie
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has rdac have scsi-ml retry errors instead of driving them itself.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/device_handler/scsi_dh_rdac.c | 84 ++++++++++++----------
 1 file changed, 46 insertions(+), 38 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 1ac2ae17e8be..f8a09e3eba58 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -485,43 +485,17 @@ static int set_mode_select(struct scsi_device *sdev, struct rdac_dh_data *h)
 static int mode_select_handle_sense(struct scsi_device *sdev,
 				    struct scsi_sense_hdr *sense_hdr)
 {
-	int err = SCSI_DH_IO;
 	struct rdac_dh_data *h = sdev->handler_data;
 
 	if (!scsi_sense_valid(sense_hdr))
-		goto done;
-
-	switch (sense_hdr->sense_key) {
-	case NO_SENSE:
-	case ABORTED_COMMAND:
-	case UNIT_ATTENTION:
-		err = SCSI_DH_RETRY;
-		break;
-	case NOT_READY:
-		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x01)
-			/* LUN Not Ready and is in the Process of Becoming
-			 * Ready
-			 */
-			err = SCSI_DH_RETRY;
-		break;
-	case ILLEGAL_REQUEST:
-		if (sense_hdr->asc == 0x91 && sense_hdr->ascq == 0x36)
-			/*
-			 * Command Lock contention
-			 */
-			err = SCSI_DH_IMM_RETRY;
-		break;
-	default:
-		break;
-	}
+		return SCSI_DH_IO;
 
 	RDAC_LOG(RDAC_LOG_FAILOVER, sdev, "array %s, ctlr %d, "
 		"MODE_SELECT returned with sense %02x/%02x/%02x",
 		(char *) h->ctlr->array_name, h->ctlr->index,
 		sense_hdr->sense_key, sense_hdr->asc, sense_hdr->ascq);
 
-done:
-	return err;
+	return SCSI_DH_IO;
 }
 
 static void send_mode_select(struct work_struct *work)
@@ -530,7 +504,7 @@ static void send_mode_select(struct work_struct *work)
 		container_of(work, struct rdac_controller, ms_work);
 	struct scsi_device *sdev = ctlr->ms_sdev;
 	struct rdac_dh_data *h = sdev->handler_data;
-	int rc, err, retry_cnt = RDAC_RETRY_COUNT;
+	int rc, err;
 	struct rdac_queue_data *tmp, *qdata;
 	LIST_HEAD(list);
 	unsigned char cdb[MAX_COMMAND_SIZE];
@@ -538,8 +512,49 @@ static void send_mode_select(struct work_struct *work)
 	unsigned int data_size;
 	blk_opf_t opf = REQ_OP_DRV_OUT | REQ_FAILFAST_DEV |
 				REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER;
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = NO_SENSE,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = ABORTED_COMMAND,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* LUN Not Ready and is in the Process of Becoming Ready */
+		{
+			.sense = NOT_READY,
+			.asc = 0x04,
+			.ascq = 0x01,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Command Lock contention */
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x91,
+			.ascq = 0x36,
+			.allowed = SCMD_FAILURE_NO_LIMIT,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.total_allowed = RDAC_RETRY_COUNT,
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 
 	spin_lock(&ctlr->ms_lock);
@@ -548,15 +563,12 @@ static void send_mode_select(struct work_struct *work)
 	ctlr->ms_sdev = NULL;
 	spin_unlock(&ctlr->ms_lock);
 
- retry:
 	memset(cdb, 0, sizeof(cdb));
 
 	data_size = rdac_failover_get(ctlr, &list, cdb);
 
-	RDAC_LOG(RDAC_LOG_FAILOVER, sdev, "array %s, ctlr %d, "
-		"%s MODE_SELECT command",
-		(char *) h->ctlr->array_name, h->ctlr->index,
-		(retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying");
+	RDAC_LOG(RDAC_LOG_FAILOVER, sdev, "array %s, ctlr %d, queueing MODE_SELECT command",
+		(char *)h->ctlr->array_name, h->ctlr->index);
 
 	rc = scsi_execute_cmd(sdev, cdb, opf, &h->ctlr->mode_select, data_size,
 			      RDAC_TIMEOUT * HZ, RDAC_RETRIES, &exec_args);
@@ -570,10 +582,6 @@ static void send_mode_select(struct work_struct *work)
 		err = SCSI_DH_IO;
 	} else {
 		err = mode_select_handle_sense(sdev, &sshdr);
-		if (err == SCSI_DH_RETRY && retry_cnt--)
-			goto retry;
-		if (err == SCSI_DH_IMM_RETRY)
-			goto retry;
 	}
 
 	list_for_each_entry_safe(qdata, tmp, &list, entry) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 09/20] scsi: spi: Have scsi-ml retry spi_execute UAs
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (7 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 08/20] scsi: rdac: Have scsi-ml retry send_mode_select errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 10/20] scsi: sd: Have scsi-ml retry sd_sync_cache errors Mike Christie
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has spi_execute have scsi-ml retry UAs instead of driving them.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_transport_spi.c | 35 ++++++++++++++++---------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c
index f668c1c0a98f..64852e6df3e3 100644
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -108,29 +108,30 @@ static int spi_execute(struct scsi_device *sdev, const void *cmd,
 		       enum req_op op, void *buffer, unsigned int bufflen,
 		       struct scsi_sense_hdr *sshdr)
 {
-	int i, result;
-	struct scsi_sense_hdr sshdr_tmp;
 	blk_opf_t opf = op | REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
 			REQ_FAILFAST_DRIVER;
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = DV_RETRIES,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
+		/* bypass the SDEV_QUIESCE state with BLK_MQ_REQ_PM */
 		.req_flags = BLK_MQ_REQ_PM,
-		.sshdr = sshdr ? : &sshdr_tmp,
+		.sshdr = sshdr,
+		.failures = &failures,
 	};
 
-	sshdr = exec_args.sshdr;
-
-	for(i = 0; i < DV_RETRIES; i++) {
-		/*
-		 * The purpose of the RQF_PM flag below is to bypass the
-		 * SDEV_QUIESCE state.
-		 */
-		result = scsi_execute_cmd(sdev, cmd, opf, buffer, bufflen,
-					  DV_TIMEOUT, 1, &exec_args);
-		if (result < 0 || !scsi_sense_valid(sshdr) ||
-		    sshdr->sense_key != UNIT_ATTENTION)
-			break;
-	}
-	return result;
+	return scsi_execute_cmd(sdev, cmd, opf, buffer, bufflen, DV_TIMEOUT, 1,
+				&exec_args);
 }
 
 static struct {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 10/20] scsi: sd: Have scsi-ml retry sd_sync_cache errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (8 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 09/20] scsi: spi: Have scsi-ml retry spi_execute UAs Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 11/20] scsi: ch: Remove unit_attention Mike Christie
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has sd_sync_cache have scsi-ml retry errors instead of driving them
itself.

There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs the block layer waits/retries for us. For possible memory
allocation failures from blk_rq_map_kern we use GFP_NOIO, so retrying
will probably not help.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index cda0d029ab7f..0d73145430a4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1645,36 +1645,35 @@ static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
 
 static int sd_sync_cache(struct scsi_disk *sdkp)
 {
-	int retries, res;
+	int res;
 	struct scsi_device *sdp = sdkp->device;
 	const int timeout = sdp->request_queue->rq_timeout
 		* SD_FLUSH_TIMEOUT_MULTIPLIER;
+	/* Leave the rest of the command zero to indicate flush everything. */
+	const unsigned char cmd[16] = { sdp->use_16_for_sync ?
+				SYNCHRONIZE_CACHE_16 : SYNCHRONIZE_CACHE };
 	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		{
+			.allowed = 3,
+			.result = SCMD_FAILURE_RESULT_ANY,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.req_flags = BLK_MQ_REQ_PM,
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 
 	if (!scsi_device_online(sdp))
 		return -ENODEV;
 
-	for (retries = 3; retries > 0; --retries) {
-		unsigned char cmd[16] = { 0 };
-
-		if (sdp->use_16_for_sync)
-			cmd[0] = SYNCHRONIZE_CACHE_16;
-		else
-			cmd[0] = SYNCHRONIZE_CACHE;
-		/*
-		 * Leave the rest of the command zero to indicate
-		 * flush everything.
-		 */
-		res = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, NULL, 0,
-				       timeout, sdkp->max_retries, &exec_args);
-		if (res == 0)
-			break;
-	}
-
+	res = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, NULL, 0, timeout,
+			       sdkp->max_retries, &exec_args);
 	if (res) {
 		sd_print_result(sdkp, "Synchronize Cache(10) failed", res);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 11/20] scsi: ch: Remove unit_attention
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (9 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 10/20] scsi: sd: Have scsi-ml retry sd_sync_cache errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 12/20] scsi: ch: Have scsi-ml retry ch_do_scsi UAs Mike Christie
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

unit_attention is not used so remove it.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/ch.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c
index cb0a399be1cc..1a998e45978e 100644
--- a/drivers/scsi/ch.c
+++ b/drivers/scsi/ch.c
@@ -113,7 +113,6 @@ typedef struct {
 	struct scsi_device  **dt;        /* ptrs to data transfer elements */
 	u_int               firsts[CH_TYPES];
 	u_int               counts[CH_TYPES];
-	u_int               unit_attention;
 	u_int		    voltags;
 	struct mutex	    lock;
 } scsi_changer;
@@ -208,7 +207,6 @@ ch_do_scsi(scsi_changer *ch, unsigned char *cmd, int cmd_len,
 
 		switch(sshdr.sense_key) {
 		case UNIT_ATTENTION:
-			ch->unit_attention = 1;
 			if (retries++ < 3)
 				goto retry;
 			break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 12/20] scsi: ch: Have scsi-ml retry ch_do_scsi UAs
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (10 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 11/20] scsi: ch: Remove unit_attention Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 13/20] scsi: Have scsi-ml retry scsi_mode_sense UAs Mike Christie
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has ch_do_scsi have scsi-ml retry UAs instead of driving them
itself.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/ch.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c
index 1a998e45978e..4811d15b8fc3 100644
--- a/drivers/scsi/ch.c
+++ b/drivers/scsi/ch.c
@@ -185,17 +185,29 @@ static int
 ch_do_scsi(scsi_changer *ch, unsigned char *cmd, int cmd_len,
 	   void *buffer, unsigned int buflength, enum req_op op)
 {
-	int errno, retries = 0, timeout, result;
+	int errno = 0, timeout, result;
 	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = 3,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 
 	timeout = (cmd[0] == INITIALIZE_ELEMENT_STATUS)
 		? timeout_init : timeout_move;
 
- retry:
-	errno = 0;
 	result = scsi_execute_cmd(ch->device, cmd, op, buffer, buflength,
 				  timeout * HZ, MAX_RETRIES, &exec_args);
 	if (result < 0)
@@ -204,13 +216,6 @@ ch_do_scsi(scsi_changer *ch, unsigned char *cmd, int cmd_len,
 		if (debug)
 			scsi_print_sense_hdr(ch->device, ch->name, &sshdr);
 		errno = ch_find_errno(&sshdr);
-
-		switch(sshdr.sense_key) {
-		case UNIT_ATTENTION:
-			if (retries++ < 3)
-				goto retry;
-			break;
-		}
 	}
 	return errno;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 13/20] scsi: Have scsi-ml retry scsi_mode_sense UAs
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (11 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 12/20] scsi: ch: Have scsi-ml retry ch_do_scsi UAs Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 14/20] scsi: Have scsi-ml retry scsi_report_lun_scan errors Mike Christie
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has scsi_mode_sense have scsi-ml retry UAs instead of driving them
itself.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_lib.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dee43c6f7ad0..69c79725a1cb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2262,11 +2262,25 @@ scsi_mode_sense(struct scsi_device *sdev, int dbd, int modepage, int subpage,
 	unsigned char cmd[12];
 	int use_10_for_ms;
 	int header_length;
-	int result, retry_count = retries;
+	int result;
 	struct scsi_sense_hdr my_sshdr;
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = retries,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		/* caller might not be interested in sense, but we need it */
 		.sshdr = sshdr ? : &my_sshdr,
+		.failures = &failures,
 	};
 
 	memset(data, 0, sizeof(*data));
@@ -2328,12 +2342,6 @@ scsi_mode_sense(struct scsi_device *sdev, int dbd, int modepage, int subpage,
 					goto retry;
 				}
 			}
-			if (scsi_status_is_check_condition(result) &&
-			    sshdr->sense_key == UNIT_ATTENTION &&
-			    retry_count) {
-				retry_count--;
-				goto retry;
-			}
 		}
 		return -EIO;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 14/20] scsi: Have scsi-ml retry scsi_report_lun_scan errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (12 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 13/20] scsi: Have scsi-ml retry scsi_mode_sense UAs Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 15/20] scsi: sd: Have pr commands retry UAs Mike Christie
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has scsi_report_lun_scan have scsi-ml retry errors instead of driving
them itself.

There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs the block layer waits/retries for us. For possible memory
allocation failures from blk_rq_map_kern we use GFP_NOIO, so retrying
will probably not help.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_scan.c | 57 +++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 8da0990b2dde..dce2d75e394a 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1416,14 +1416,34 @@ static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t bflag
 	unsigned int length;
 	u64 lun;
 	unsigned int num_luns;
-	unsigned int retries;
 	int result;
 	struct scsi_lun *lunp, *lun_data;
-	struct scsi_sense_hdr sshdr;
 	struct scsi_device *sdev;
 	struct Scsi_Host *shost = dev_to_shost(&starget->dev);
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Fail all CCs except the UA above */
+		{
+			.sense = SCMD_FAILURE_SENSE_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Retry any other errors not listed above */
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.total_allowed = 3,
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
-		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 	int ret = 0;
 
@@ -1494,29 +1514,18 @@ static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t bflag
 	 * should come through as a check condition, and will not generate
 	 * a retry.
 	 */
-	for (retries = 0; retries < 3; retries++) {
-		SCSI_LOG_SCAN_BUS(3, sdev_printk (KERN_INFO, sdev,
-				"scsi scan: Sending REPORT LUNS to (try %d)\n",
-				retries));
-
-		result = scsi_execute_cmd(sdev, scsi_cmd, REQ_OP_DRV_IN,
-					  lun_data, length,
-					  SCSI_REPORT_LUNS_TIMEOUT, 3,
-					  &exec_args);
+	scsi_reset_failures(&failures);
 
-		SCSI_LOG_SCAN_BUS(3, sdev_printk (KERN_INFO, sdev,
-				"scsi scan: REPORT LUNS"
-				" %s (try %d) result 0x%x\n",
-				result ?  "failed" : "successful",
-				retries, result));
-		if (result == 0)
-			break;
-		else if (scsi_sense_valid(&sshdr)) {
-			if (sshdr.sense_key != UNIT_ATTENTION)
-				break;
-		}
-	}
+	SCSI_LOG_SCAN_BUS(3, sdev_printk (KERN_INFO, sdev,
+			  "scsi scan: Sending REPORT LUNS\n"));
+
+	result = scsi_execute_cmd(sdev, scsi_cmd, REQ_OP_DRV_IN, lun_data,
+				  length, SCSI_REPORT_LUNS_TIMEOUT, 3,
+				  &exec_args);
 
+	SCSI_LOG_SCAN_BUS(3, sdev_printk (KERN_INFO, sdev,
+			  "scsi scan: REPORT LUNS  %s result 0x%x\n",
+			  result ?  "failed" : "successful", result));
 	if (result) {
 		/*
 		 * The device probably does not support a REPORT LUN command
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 15/20] scsi: sd: Have pr commands retry UAs
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (13 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 14/20] scsi: Have scsi-ml retry scsi_report_lun_scan errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 16/20] scsi: sd: Have scsi-ml retry read_capacity_10 errors Mike Christie
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

It's common to get a UA when doing PR commands. It could be due to a
target restarting, transport level relogin or other PR commands like a
release causing it. The upper layers don't get the sense and in some cases
have no idea if it's a SCSI device, so this has the sd layer retry.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 0d73145430a4..87a8feabc2b4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1800,8 +1800,22 @@ static int sd_pr_in_command(struct block_device *bdev, u8 sa,
 	struct scsi_device *sdev = sdkp->device;
 	struct scsi_sense_hdr sshdr;
 	u8 cmd[10] = { PERSISTENT_RESERVE_IN, sa };
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = 5,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 	int result;
 
@@ -1888,8 +1902,22 @@ static int sd_pr_out_command(struct block_device *bdev, u8 sa, u64 key,
 	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
 	struct scsi_device *sdev = sdkp->device;
 	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = 5,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 	int result;
 	u8 cmd[16] = { 0, };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 16/20] scsi: sd: Have scsi-ml retry read_capacity_10 errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (14 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 15/20] scsi: sd: Have pr commands retry UAs Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 17/20] scsi: ses: Have scsi-ml retry scsi_execute_cmd errors Mike Christie
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has read_capacity_10 have scsi-ml retry errors instead of driving
them itself.

There are 2 behavior changes with this patch:
1. There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs since the block layer waits/retries for us. For possible
memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
retrying will probably not help.
2. For the specific UAs we checked for and retried, we would get
READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
from the main loop's retries. Each UA now gets
READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
is already 10 and is not based on anything specific like a spec or
device, so the extra 3 we got from the main loop was probably just an
accident and is not going to help.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c | 62 +++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 87a8feabc2b4..cd6a6b31433c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2622,42 +2622,58 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 						unsigned char *buffer)
 {
-	unsigned char cmd[16];
+	static const u8 cmd[10] = { READ_CAPACITY };
 	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		/* Do not retry Medium Not Present */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x3A,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = 0x3A,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		 /* Device reset might occur several times so retry a lot */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Any other error not listed above retry 3 times */
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+			.allowed = 3,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 	int sense_valid = 0;
 	int the_result;
-	int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
 	sector_t lba;
 	unsigned sector_size;
 
-	do {
-		cmd[0] = READ_CAPACITY;
-		memset(&cmd[1], 0, 9);
-		memset(buffer, 0, 8);
+	memset(buffer, 0, 8);
+
+	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
+				      8, SD_TIMEOUT, sdkp->max_retries,
+				      &exec_args);
 
-		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
-					      8, SD_TIMEOUT, sdkp->max_retries,
-					      &exec_args);
+	if (the_result > 0) {
+		sense_valid = scsi_sense_valid(&sshdr);
 
 		if (media_not_present(sdkp, &sshdr))
 			return -ENODEV;
-
-		if (the_result > 0) {
-			sense_valid = scsi_sense_valid(&sshdr);
-			if (sense_valid &&
-			    sshdr.sense_key == UNIT_ATTENTION &&
-			    sshdr.asc == 0x29 && sshdr.ascq == 0x00)
-				/* Device reset might occur several times,
-				 * give it one more chance */
-				if (--reset_retries > 0)
-					continue;
-		}
-		retries--;
-
-	} while (the_result && retries);
+	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(10) failed", the_result);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 17/20] scsi: ses: Have scsi-ml retry scsi_execute_cmd errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (15 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 16/20] scsi: sd: Have scsi-ml retry read_capacity_10 errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 18/20] scsi: sr: Have scsi-ml retry get_sectorsize errors Mike Christie
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has ses have scsi-ml retry scsi_execute_cmd errors instead of
driving them itself.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/ses.c | 66 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 46 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index d7d0c35c58b8..0f2c87cc95e6 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -87,19 +87,32 @@ static int ses_recv_diag(struct scsi_device *sdev, int page_code,
 		0
 	};
 	unsigned char recv_page_code;
-	unsigned int retries = SES_RETRIES;
-	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = SES_RETRIES,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = SES_RETRIES,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
-		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 
-	do {
-		ret = scsi_execute_cmd(sdev, cmd, REQ_OP_DRV_IN, buf, bufflen,
-				       SES_TIMEOUT, 1, &exec_args);
-	} while (ret > 0 && --retries && scsi_sense_valid(&sshdr) &&
-		 (sshdr.sense_key == NOT_READY ||
-		  (sshdr.sense_key == UNIT_ATTENTION && sshdr.asc == 0x29)));
-
+	ret = scsi_execute_cmd(sdev, cmd, REQ_OP_DRV_IN, buf, bufflen,
+			       SES_TIMEOUT, 1, &exec_args);
 	if (unlikely(ret))
 		return ret;
 
@@ -131,19 +144,32 @@ static int ses_send_diag(struct scsi_device *sdev, int page_code,
 		bufflen & 0xff,
 		0
 	};
-	struct scsi_sense_hdr sshdr;
-	unsigned int retries = SES_RETRIES;
+	struct scsi_failure failure_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = SES_RETRIES,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = SES_RETRIES,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
-		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 
-	do {
-		result = scsi_execute_cmd(sdev, cmd, REQ_OP_DRV_OUT, buf,
-					  bufflen, SES_TIMEOUT, 1, &exec_args);
-	} while (result > 0 && --retries && scsi_sense_valid(&sshdr) &&
-		 (sshdr.sense_key == NOT_READY ||
-		  (sshdr.sense_key == UNIT_ATTENTION && sshdr.asc == 0x29)));
-
+	result = scsi_execute_cmd(sdev, cmd, REQ_OP_DRV_OUT, buf, bufflen,
+				  SES_TIMEOUT, 1, &exec_args);
 	if (result)
 		sdev_printk(KERN_ERR, sdev, "SEND DIAGNOSTIC result: %8x\n",
 			    result);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 18/20] scsi: sr: Have scsi-ml retry get_sectorsize errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (16 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 17/20] scsi: ses: Have scsi-ml retry scsi_execute_cmd errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 19/20] scsi: ufs: Have scsi-ml retry start stop errors Mike Christie
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has get_sectorsize have scsi-ml retry errors instead of driving them
itself.

There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs the block layer waits/retries for us. For possible memory
allocation failures from blk_rq_map_kern we use GFP_NOIO, so retrying
will probably not help.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sr.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index d093dd187b2f..268b3a40891e 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -717,27 +717,29 @@ static int sr_probe(struct device *dev)
 
 static void get_sectorsize(struct scsi_cd *cd)
 {
-	unsigned char cmd[10];
-	unsigned char buffer[8];
-	int the_result, retries = 3;
+	static const u8 cmd[10] = { READ_CAPACITY };
+	unsigned char buffer[8] = { };
+	int the_result;
 	int sector_size;
 	struct request_queue *queue;
+	struct scsi_failure failure_defs[] = {
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+			.allowed = 3,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
+	const struct scsi_exec_args exec_args = {
+		.failures = &failures,
+	};
 
-	do {
-		cmd[0] = READ_CAPACITY;
-		memset((void *) &cmd[1], 0, 9);
-		memset(buffer, 0, sizeof(buffer));
-
-		/* Do the command and wait.. */
-		the_result = scsi_execute_cmd(cd->device, cmd, REQ_OP_DRV_IN,
-					      buffer, sizeof(buffer),
-					      SR_TIMEOUT, MAX_RETRIES, NULL);
-
-		retries--;
-
-	} while (the_result && retries);
-
-
+	/* Do the command and wait.. */
+	the_result = scsi_execute_cmd(cd->device, cmd, REQ_OP_DRV_IN, buffer,
+				      sizeof(buffer), SR_TIMEOUT, MAX_RETRIES,
+				      &exec_args);
 	if (the_result) {
 		cd->capacity = 0x1fffff;
 		sector_size = 2048;	/* A guess, just in case */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 19/20] scsi: ufs: Have scsi-ml retry start stop errors
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (17 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 18/20] scsi: sr: Have scsi-ml retry get_sectorsize errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14  1:37 ` [PATCH v12 20/20] scsi: Add kunit tests for scsi_check_passthrough Mike Christie
  2023-11-16 11:02 ` [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries John Garry
  20 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This has scsi-ml retry errors instead of driving them itself.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/ufs/core/ufshcd.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 8b1031fb0a44..001b27ef2bdb 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -9384,7 +9384,17 @@ static int ufshcd_execute_start_stop(struct scsi_device *sdev,
 				     struct scsi_sense_hdr *sshdr)
 {
 	const unsigned char cdb[6] = { START_STOP, 0, 0, 0, pwr_mode << 4, 0 };
+	struct scsi_failure failure_defs[] = {
+		{
+			.allowed = 2,
+			.result = SCMD_FAILURE_RESULT_ANY,
+		},
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args args = {
+		.failures = &failures,
 		.sshdr = sshdr,
 		.req_flags = BLK_MQ_REQ_PM,
 		.scmd_flags = SCMD_FAIL_IF_RECOVERING,
@@ -9410,7 +9420,7 @@ static int ufshcd_set_dev_pwr_mode(struct ufs_hba *hba,
 	struct scsi_sense_hdr sshdr;
 	struct scsi_device *sdp;
 	unsigned long flags;
-	int ret, retries;
+	int ret;
 
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	sdp = hba->ufs_device_wlun;
@@ -9436,15 +9446,7 @@ static int ufshcd_set_dev_pwr_mode(struct ufs_hba *hba,
 	 * callbacks hence set the RQF_PM flag so that it doesn't resume the
 	 * already suspended childs.
 	 */
-	for (retries = 3; retries > 0; --retries) {
-		ret = ufshcd_execute_start_stop(sdp, pwr_mode, &sshdr);
-		/*
-		 * scsi_execute() only returns a negative value if the request
-		 * queue is dying.
-		 */
-		if (ret <= 0)
-			break;
-	}
+	ret = ufshcd_execute_start_stop(sdp, pwr_mode, &sshdr);
 	if (ret) {
 		sdev_printk(KERN_WARNING, sdp,
 			    "START_STOP failed for power mode: %d, result %x\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v12 20/20] scsi: Add kunit tests for scsi_check_passthrough
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (18 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 19/20] scsi: ufs: Have scsi-ml retry start stop errors Mike Christie
@ 2023-11-14  1:37 ` Mike Christie
  2023-11-14 18:38   ` Bart Van Assche
  2023-11-16 11:02 ` [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries John Garry
  20 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2023-11-14  1:37 UTC (permalink / raw)
  To: mwilck, john.g.garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

Add some kunit tests for scsi_check_passthrough so we can easily make sure
we are hitting the cases it's difficult to replicate in hardware or even
scsi_debug.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/Kconfig         |   9 +
 drivers/scsi/scsi_lib.c      |   4 +
 drivers/scsi/scsi_lib_test.c | 330 +++++++++++++++++++++++++++++++++++
 3 files changed, 343 insertions(+)
 create mode 100644 drivers/scsi/scsi_lib_test.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index addac7fbe37b..e4ecf6491c41 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -67,6 +67,15 @@ config SCSI_PROC_FS
 
 	  If unsure say Y.
 
+config SCSI_KUNIT_TEST
+	tristate "KUnit tests for SCSI Mid Layer" if !KUNIT_ALL_TESTS
+	depends on KUNIT
+	default KUNIT_ALL_TESTS
+	help
+	  Run SCSI Mid Layer's KUnit tests.
+
+	  If unsure say N.
+
 comment "SCSI support type (disk, tape, CD-ROM)"
 	depends on SCSI
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 69c79725a1cb..b3b3ed8d435b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3434,3 +3434,7 @@ void scsi_build_sense(struct scsi_cmnd *scmd, int desc, u8 key, u8 asc, u8 ascq)
 	scmd->result = SAM_STAT_CHECK_CONDITION;
 }
 EXPORT_SYMBOL_GPL(scsi_build_sense);
+
+#ifdef CONFIG_SCSI_KUNIT_TEST
+#include "scsi_lib_test.c"
+#endif
diff --git a/drivers/scsi/scsi_lib_test.c b/drivers/scsi/scsi_lib_test.c
new file mode 100644
index 000000000000..b4f7576edf2c
--- /dev/null
+++ b/drivers/scsi/scsi_lib_test.c
@@ -0,0 +1,330 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit tests for scsi_lib.c.
+ *
+ * Copyright (C) 2023, Oracle Corporation
+ */
+#include <kunit/test.h>
+
+#include <scsi/scsi_proto.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_device.h>
+
+#define SCSI_LIB_TEST_MAX_ALLOWED 3
+#define SCSI_LIB_TEST_TOTAL_MAX_ALLOWED 5
+
+static void scsi_lib_test_multiple_sense(struct kunit *test)
+{
+	struct scsi_failure multiple_sense_failure_defs[] = {
+		{
+			.sense = DATA_PROTECT,
+			.asc = 0x1,
+			.ascq = 0x1,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x11,
+			.ascq = 0x0,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = 0x11,
+			.ascq = 0x22,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = ABORTED_COMMAND,
+			.asc = 0x11,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = HARDWARE_ERROR,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x91,
+			.ascq = 0x36,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = multiple_sense_failure_defs,
+	};
+	u8 sense[SCSI_SENSE_BUFFERSIZE] = {};
+	struct scsi_cmnd sc = {
+		.sense_buffer = sense,
+	};
+	int i;
+
+	/* Match end of array */
+	scsi_build_sense(&sc, 0, ILLEGAL_REQUEST, 0x91, 0x36);
+	KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+	/* Basic match in array */
+	scsi_build_sense(&sc, 0, UNIT_ATTENTION, 0x11, 0x0);
+	KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+	/* No matching sense entry */
+	scsi_build_sense(&sc, 0, MISCOMPARE, 0x11, 0x11);
+	KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));
+	/* Match using SCMD_FAILURE_ASCQ_ANY */
+	scsi_build_sense(&sc, 0, ABORTED_COMMAND, 0x11, 0x22);
+	KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+	/* Fail to match */
+	scsi_build_sense(&sc, 0, ABORTED_COMMAND, 0x22, 0x22);
+	KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));
+	/* Match using SCMD_FAILURE_ASC_ANY */
+	scsi_build_sense(&sc, 0, HARDWARE_ERROR, 0x11, 0x22);
+	KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+	/* No matching status entry */
+	sc.result = SAM_STAT_RESERVATION_CONFLICT;
+	KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));
+
+	/* Test hitting allowed limit */
+	scsi_build_sense(&sc, 0, NOT_READY, 0x11, 0x22);
+	for (i = 0; i < SCSI_LIB_TEST_MAX_ALLOWED; i++)
+		KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc,
+				&failures));
+	KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));
+
+	/* reset retries so we can retest */
+	failures.failure_definitions = multiple_sense_failure_defs;
+	scsi_reset_failures(&failures);
+
+	/* Test no retries allowed */
+	scsi_build_sense(&sc, 0, DATA_PROTECT, 0x1, 0x1);
+	KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));
+}
+
+static void scsi_lib_test_any_sense(struct kunit *test)
+{
+	struct scsi_failure any_sense_failure_defs[] = {
+		{
+			.result = SCMD_FAILURE_SENSE_ANY,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = any_sense_failure_defs,
+	};
+	u8 sense[SCSI_SENSE_BUFFERSIZE] = {};
+	struct scsi_cmnd sc = {
+		.sense_buffer = sense,
+	};
+
+	/* Match using SCMD_FAILURE_SENSE_ANY */
+	failures.failure_definitions = any_sense_failure_defs;
+	scsi_build_sense(&sc, 0, MEDIUM_ERROR, 0x11, 0x22);
+	KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+}
+
+static void scsi_lib_test_host(struct kunit *test)
+{
+	struct scsi_failure retryable_host_failure_defs[] = {
+		{
+			.result = DID_TRANSPORT_DISRUPTED << 16,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+		},
+		{
+			.result = DID_TIME_OUT << 16,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = retryable_host_failure_defs,
+	};
+	u8 sense[SCSI_SENSE_BUFFERSIZE] = {};
+	struct scsi_cmnd sc = {
+		.sense_buffer = sense,
+	};
+
+	/* No matching host byte entry */
+	failures.failure_definitions = retryable_host_failure_defs;
+	sc.result = DID_NO_CONNECT << 16;
+	KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));
+	/* Matching host byte entry */
+	sc.result = DID_TIME_OUT << 16;
+	KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+}
+
+static void scsi_lib_test_any_failure(struct kunit *test)
+{
+	struct scsi_failure any_failure_defs[] = {
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = any_failure_defs,
+	};
+	u8 sense[SCSI_SENSE_BUFFERSIZE] = {};
+	struct scsi_cmnd sc = {
+		.sense_buffer = sense,
+	};
+
+	/* Match SCMD_FAILURE_RESULT_ANY */
+	failures.failure_definitions = any_failure_defs;
+	sc.result = DID_TRANSPORT_FAILFAST << 16;
+	KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+}
+
+static void scsi_lib_test_any_status(struct kunit *test)
+{
+	struct scsi_failure any_status_failure_defs[] = {
+		{
+			.result = SCMD_FAILURE_STAT_ANY,
+			.allowed = SCSI_LIB_TEST_MAX_ALLOWED,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = any_status_failure_defs,
+	};
+	u8 sense[SCSI_SENSE_BUFFERSIZE] = {};
+	struct scsi_cmnd sc = {
+		.sense_buffer = sense,
+	};
+
+	/* Test any status handling */
+	failures.failure_definitions = any_status_failure_defs;
+	sc.result = SAM_STAT_RESERVATION_CONFLICT;
+	KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+}
+
+static void scsi_lib_test_total_allowed(struct kunit *test)
+{
+	struct scsi_failure total_allowed_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+                },
+		/* Fail all CCs except the UA above */
+		{
+			.sense = SCMD_FAILURE_SENSE_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Retry any other errors not listed above */
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+		},
+		{}
+        };
+	struct scsi_failures failures = {
+		.failure_definitions = total_allowed_defs,
+	};
+	u8 sense[SCSI_SENSE_BUFFERSIZE] = {};
+	struct scsi_cmnd sc = {
+		.sense_buffer = sense,
+	};
+	int i;
+
+	/* Test total_allowed */
+	failures.failure_definitions = total_allowed_defs;
+	scsi_reset_failures(&failures);
+	failures.total_allowed = SCSI_LIB_TEST_TOTAL_MAX_ALLOWED;
+
+	scsi_build_sense(&sc, 0, UNIT_ATTENTION, 0x28, 0x0);
+	for (i = 0; i < SCSI_LIB_TEST_TOTAL_MAX_ALLOWED; i++)
+		/* Retry since we under the total_allowed limit */
+		KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc,
+				&failures));
+	sc.result = DID_TIME_OUT << 16;
+	/* We have now hit the total_allowed limit so no more retries */
+	KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));
+}
+
+static void scsi_lib_test_mixed_total(struct kunit *test)
+{
+	struct scsi_failure mixed_total_defs[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x28,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.allowed = 1,
+			.result = DID_TIME_OUT << 16,
+		},
+		{}
+	};
+	u8 sense[SCSI_SENSE_BUFFERSIZE] = {};
+	struct scsi_failures failures = {
+		.failure_definitions = mixed_total_defs,
+	};
+	struct scsi_cmnd sc = {
+		.sense_buffer = sense,
+	};
+	int i;
+
+	/*
+	 * Test total_allowed when there is a mix of per failure allowed
+	 * and total_allowed limits.
+	 */
+	failures.failure_definitions = mixed_total_defs;
+	scsi_reset_failures(&failures);
+	failures.total_allowed = SCSI_LIB_TEST_TOTAL_MAX_ALLOWED;
+
+	scsi_build_sense(&sc, 0, UNIT_ATTENTION, 0x28, 0x0);
+	for (i = 0; i < SCSI_LIB_TEST_TOTAL_MAX_ALLOWED; i++)
+		/* Retry since we under the total_allowed limit */
+		KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc,
+				&failures));
+	/* Do not retry since we are now over total_allowed limit */
+	KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));
+
+	scsi_reset_failures(&failures);
+	scsi_build_sense(&sc, 0, UNIT_ATTENTION, 0x28, 0x0);
+	for (i = 0; i < SCSI_LIB_TEST_TOTAL_MAX_ALLOWED; i++)
+		/* Retry since we under the total_allowed limit */
+		KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc,
+				&failures));
+	sc.result = DID_TIME_OUT << 16;
+	/* Retry because this failure has a per failure limit */
+	KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+	scsi_build_sense(&sc, 0, UNIT_ATTENTION, 0x29, 0x0);
+	/* total_allowed is now hit so no more retries */
+	KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));
+}
+
+static void scsi_lib_test_check_passthough(struct kunit *test)
+{
+	scsi_lib_test_multiple_sense(test);
+	scsi_lib_test_any_sense(test);
+	scsi_lib_test_host(test);
+	scsi_lib_test_any_failure(test);
+	scsi_lib_test_any_status(test);
+	scsi_lib_test_total_allowed(test);
+	scsi_lib_test_mixed_total(test);
+}
+
+static struct kunit_case scsi_lib_test_cases[] = {
+	KUNIT_CASE(scsi_lib_test_check_passthough),
+	{}
+};
+
+static struct kunit_suite scsi_lib_test_suite = {
+	.name = "scsi_lib",
+	.test_cases = scsi_lib_test_cases,
+};
+
+kunit_test_suite(scsi_lib_test_suite);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 20/20] scsi: Add kunit tests for scsi_check_passthrough
  2023-11-14  1:37 ` [PATCH v12 20/20] scsi: Add kunit tests for scsi_check_passthrough Mike Christie
@ 2023-11-14 18:38   ` Bart Van Assche
  0 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2023-11-14 18:38 UTC (permalink / raw)
  To: Mike Christie, mwilck, john.g.garry, hch, martin.petersen,
	linux-scsi, james.bottomley

On 11/13/23 17:37, Mike Christie wrote:
> Add some kunit tests for scsi_check_passthrough so we can easily make sure
> we are hitting the cases it's difficult to replicate in hardware or even
                      ^^^^^^^^^^

It seems like the words "for which" are missing here?

> +config SCSI_KUNIT_TEST
> +	tristate "KUnit tests for SCSI Mid Layer" if !KUNIT_ALL_TESTS
> +	depends on KUNIT
> +	default KUNIT_ALL_TESTS
> +	help
> +	  Run SCSI Mid Layer's KUnit tests.
> +
> +	  If unsure say N.
> +

Is a help text required for KUNIT Kconfig entries or is this something that
can be left out?

Please rename SCSI_KUNIT_TEST into SCSI_LIB_KUNIT_TEST such that different
unit tests can be selected individually. More tests will be added by my
patch series that improves zoned write performance.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 01/20] scsi: Allow passthrough to request scsi-ml retries
  2023-11-14  1:37 ` [PATCH v12 01/20] scsi: Allow passthrough to request scsi-ml retries Mike Christie
@ 2023-11-16 10:53   ` John Garry
  0 siblings, 0 replies; 32+ messages in thread
From: John Garry @ 2023-11-16 10:53 UTC (permalink / raw)
  To: Mike Christie, mwilck, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley

On 14/11/2023 01:37, Mike Christie wrote:
> For passthrough we don't retry any error we get a check condition for.

nit: For passthrough we don't retry any error which we get a check 
condition for.

> This results in a lot of callers driving their own retries for all UAs,
> specific UAs, NOT_READY, specific sense values or any type of failure.
> 
> This adds the core code to allow passthrough users to specify what errors
> they want scsi-ml to retry for them. We can then convert users to drop
> a lot of their sense parsing and retry handling.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Looks ok to me, so feel free to pick up:
Reviewed-by: John Garry <john.g.garry@oracle.com>

> ---
>   drivers/scsi/scsi_lib.c    | 92 ++++++++++++++++++++++++++++++++++++++
>   include/scsi/scsi_device.h | 48 ++++++++++++++++++++
>   2 files changed, 140 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index cf3864f72093..dee43c6f7ad0 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -184,6 +184,92 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
>   	__scsi_queue_insert(cmd, reason, true);
>   }
>   
> +void scsi_reset_failures(struct scsi_failures *failures)

nit: maybe scsi_reset_failures_retries as the name, to be more specific

> +{
> +	struct scsi_failure *failure;
> +
> +	failures->total_retries = 0;
> +
> +	for (failure = failures->failure_definitions; failure->result;
> +	     failure++)
> +		failure->retries = 0;
> +}
> +EXPORT_SYMBOL_GPL(scsi_reset_failures);
> +
> +/**
> + * scsi_check_passthrough - Determine if passthrough scsi_cmnd needs a retry.
> + * @scmd: scsi_cmnd to check.
> + * @failures: scsi_failures struct that lists failures to check for.
> + *
> + * Returns -EAGAIN if the caller should retry else 0.
> + */
> +static int scsi_check_passthrough(struct scsi_cmnd *scmd,
> +				  struct scsi_failures *failures)
> +{
> +	struct scsi_failure *failure;
> +	struct scsi_sense_hdr sshdr;
> +	enum sam_status status;
> +
> +	if (!failures)
> +		return 0;
> +
> +	for (failure = failures->failure_definitions; failure->result;
> +	     failure++) {
> +		if (failure->result == SCMD_FAILURE_RESULT_ANY)
> +			goto maybe_retry;
> +
> +		if (host_byte(scmd->result) &&
> +		    host_byte(scmd->result) == host_byte(failure->result))
> +			goto maybe_retry;
> +
> +		status = status_byte(scmd->result);
> +		if (!status)
> +			continue;
> +
> +		if (failure->result == SCMD_FAILURE_STAT_ANY &&
> +		    !scsi_status_is_good(scmd->result))
> +			goto maybe_retry;
> +
> +		if (status != status_byte(failure->result))
> +			continue;
> +
> +		if (status_byte(failure->result) != SAM_STAT_CHECK_CONDITION ||
> +		    failure->sense == SCMD_FAILURE_SENSE_ANY)
> +			goto maybe_retry;
> +
> +		if (!scsi_command_normalize_sense(scmd, &sshdr))
> +			return 0;
> +
> +		if (failure->sense != sshdr.sense_key)
> +			continue;
> +
> +		if (failure->asc == SCMD_FAILURE_ASC_ANY)
> +			goto maybe_retry;
> +
> +		if (failure->asc != sshdr.asc)
> +			continue;
> +
> +		if (failure->ascq == SCMD_FAILURE_ASCQ_ANY ||
> +		    failure->ascq == sshdr.ascq)
> +			goto maybe_retry;
> +	}
> +
> +	return 0;
> +
> +maybe_retry:
> +	if (failure->allowed) {
> +		if (failure->allowed == SCMD_FAILURE_NO_LIMIT ||
> +		    ++failure->retries <= failure->allowed)
> +			return -EAGAIN;
> +	} else {
> +		if (failures->total_allowed == SCMD_FAILURE_NO_LIMIT ||
> +		    ++failures->total_retries <= failures->total_allowed)
> +			return -EAGAIN;
> +	}
> +
> +	return 0;
> +}
> +
>   /**
>    * scsi_execute_cmd - insert request and wait for the result
>    * @sdev:	scsi_device
> @@ -214,6 +300,7 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
>   			      args->sense_len != SCSI_SENSE_BUFFERSIZE))
>   		return -EINVAL;
>   
> +retry:
>   	req = scsi_alloc_request(sdev->request_queue, opf, args->req_flags);
>   	if (IS_ERR(req))
>   		return PTR_ERR(req);
> @@ -237,6 +324,11 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
>   	 */
>   	blk_execute_rq(req, true);
>   
> +	if (scsi_check_passthrough(scmd, args->failures) == -EAGAIN) {
> +		blk_mq_free_request(req);
> +		goto retry;
> +	}
> +
>   	/*
>   	 * Some devices (USB mass-storage in particular) may transfer
>   	 * garbage data together with a residue indicating that the data
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 10480eb582b2..c92d6d9e644e 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -483,6 +483,52 @@ extern int scsi_is_sdev_device(const struct device *);
>   extern int scsi_is_target_device(const struct device *);
>   extern void scsi_sanitize_inquiry_string(unsigned char *s, int len);
>   
> +/*
> + * scsi_execute_cmd users can set scsi_failure.result to have
> + * scsi_check_passthrough fail/retry a command. scsi_failure.result can be a
> + * specific host byte or message code, or SCMD_FAILURE_RESULT_ANY can be used
> + * to match any host or message code.
> + */
> +#define SCMD_FAILURE_RESULT_ANY	0x7fffffff
> +/*
> + * Set scsi_failure.result to SCMD_FAILURE_STAT_ANY to fail/retry any failure
> + * scsi_status_is_good returns false for.
> + */
> +#define SCMD_FAILURE_STAT_ANY	0xff
> +/*
> + * The following can be set to the scsi_failure sense, asc and ascq fields to
> + * match on any sense, ASC, or ASCQ value.
> + */
> +#define SCMD_FAILURE_SENSE_ANY	0xff
> +#define SCMD_FAILURE_ASC_ANY	0xff
> +#define SCMD_FAILURE_ASCQ_ANY	0xff
> +/* Always retry a matching failure. */
> +#define SCMD_FAILURE_NO_LIMIT	-1
> +
> +struct scsi_failure {
> +	int result;
> +	u8 sense;
> +	u8 asc;
> +	u8 ascq;
> +
> +	/*
> +	 * Number of attempts allowed for this failure. It does not count
> +	 * for the total_allowed.
> +	 */
> +	s8 allowed;
> +	s8 retries;
> +};
> +
> +struct scsi_failures {
> +	/*
> +	 * If the failure does not have a specific limit in the scsi_failure
> +	 * then this limit is followed.
> +	 */
> +	int total_allowed;
> +	int total_retries;
> +	struct scsi_failure *failure_definitions;
> +};
> +
>   /* Optional arguments to scsi_execute_cmd */
>   struct scsi_exec_args {
>   	unsigned char *sense;		/* sense buffer */
> @@ -491,12 +537,14 @@ struct scsi_exec_args {
>   	blk_mq_req_flags_t req_flags;	/* BLK_MQ_REQ flags */
>   	int scmd_flags;			/* SCMD flags */
>   	int *resid;			/* residual length */
> +	struct scsi_failures *failures;	/* failures to retry */
>   };
>   
>   int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
>   		     blk_opf_t opf, void *buffer, unsigned int bufflen,
>   		     int timeout, int retries,
>   		     const struct scsi_exec_args *args);
> +void scsi_reset_failures(struct scsi_failures *failures);
>   
>   extern void sdev_disable_disk_events(struct scsi_device *sdev);
>   extern void sdev_enable_disk_events(struct scsi_device *sdev);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries
  2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
                   ` (19 preceding siblings ...)
  2023-11-14  1:37 ` [PATCH v12 20/20] scsi: Add kunit tests for scsi_check_passthrough Mike Christie
@ 2023-11-16 11:02 ` John Garry
  2023-11-16 16:29   ` Mike Christie
  20 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2023-11-16 11:02 UTC (permalink / raw)
  To: Mike Christie, mwilck, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley

On 14/11/2023 01:37, Mike Christie wrote:
> The following patches were made over Martin's 6.7 staging branch
> which contains a sd change that this patch requires.

Hi Mike,

It would be nice if we could apply to 6.8 staging - was the sd.c change 
not in v6.7-rc1?

Cheers,
John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 02/20] scsi: Have scsi-ml retry scsi_probe_lun errors
  2023-11-14  1:37 ` [PATCH v12 02/20] scsi: Have scsi-ml retry scsi_probe_lun errors Mike Christie
@ 2023-11-16 11:14   ` John Garry
  2023-11-16 16:38     ` Mike Christie
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2023-11-16 11:14 UTC (permalink / raw)
  To: Mike Christie, mwilck, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley

On 14/11/2023 01:37, Mike Christie wrote:
>   
>   	/* Each pass gets up to three chances to ignore Unit Attention */
> +	scsi_reset_failures(&failures);
> +
>   	for (count = 0; count < 3; ++count) {

So do we keep this loop for the USB workaround (not shown)? I just 
wonder why we retry in this outer loop as well as now inside 
scsi_execute_cmd(). Maybe I'm missing something ....

Thanks,
John

>   		memset(scsi_cmd, 0, 6);
>   		scsi_cmd[0] = INQUIRY;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 04/20] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2023-11-14  1:37 ` [PATCH v12 04/20] scsi: sd: Have scsi-ml retry read_capacity_16 errors Mike Christie
@ 2023-11-16 11:39   ` John Garry
  2023-11-16 17:15     ` Mike Christie
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2023-11-16 11:39 UTC (permalink / raw)
  To: Mike Christie, mwilck, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley

On 14/11/2023 01:37, Mike Christie wrote:

Feel free to pick up:
Reviewed-by: John Garry <john.g.garry@oracle.com>

> +	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
> +				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,

BTW, some might find it a little confusing that we have 
scsi_execute_cmd() retries arg as well as being able to pass 
exec_args.failure, and exec_args.failure may allow us to retry. Could 
this be improved, even in terms of arg or struct member naming/comments?

Thanks,
John

> +				      &exec_args);
>   
> -	} while (the_result && retries);
> +	if (the_result > 0) {
> +		if (media_not_present(sdkp, &sshdr))
> +			return -ENODEV;
> +
> +		sense_valid = scsi_sense_valid(&sshdr);
> +		if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST &&
> +		    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
> +		     sshdr.ascq == 0x00) {
> +			/*
> +			 * Invalid Command Operation Code or Invalid Field in
> +			 * CDB, just retry silently with RC10
> +			 */
> +			return -EINVAL;
> +		}
> +	}


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 06/20] scsi: Have scsi-ml retry sd_spinup_disk errors
  2023-11-14  1:37 ` [PATCH v12 06/20] scsi: Have scsi-ml retry sd_spinup_disk errors Mike Christie
@ 2023-11-16 12:13   ` John Garry
  2023-11-16 16:44     ` Mike Christie
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2023-11-16 12:13 UTC (permalink / raw)
  To: Mike Christie, mwilck, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley

On 14/11/2023 01:37, Mike Christie wrote:

Hi Mike,

> This simplifies sd_spinup_disk so scsi-ml retries errors for it. Note that
> we retried specifically on a UA and also if scsi_status_is_good returned
> failed which would happen for all check conditions. In this patch we use
> SCMD_FAILURE_STAT_ANY which will trigger for the same conditions as
> when scsi_status_is_good returns false and there is status. This will
> cover all CCs including UAs so there is no explicit failures arrary entry

/s/arrary/array/

> for UAs.

But the first failure_defs member is for UNIT_ATTENTION, below, right?

Thanks,
John

> 
> There is one behavior change where we no longer retry when
> scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
> for failures like the queue being removed, and for the case where there
> are no tags/reqs the block layer waits/retries for us. For possible memory
> allocation failures from blk_rq_map_kern we use GFP_NOIO, so retrying
> will probably not help.
> 
> We do not handle the outside loop's retries because we want to sleep
> between tries and we don't support that yet.
> 
> Signed-off-by: Mike Christie<michael.christie@oracle.com>
> ---
>   drivers/scsi/sd.c | 77 +++++++++++++++++++++++++++--------------------
>   1 file changed, 45 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 641f9c9c0674..cda0d029ab7f 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2235,55 +2235,68 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>   static void
>   sd_spinup_disk(struct scsi_disk *sdkp)
>   {
> -	unsigned char cmd[10];
> +	static const u8 cmd[10] = { TEST_UNIT_READY };
>   	unsigned long spintime_expire = 0;
> -	int retries, spintime;
> +	int spintime, sense_valid = 0;
>   	unsigned int the_result;
>   	struct scsi_sense_hdr sshdr;
> +	struct scsi_failure failure_defs[] = {
> +		/* Do not retry Medium Not Present */
> +		{
> +			.sense = UNIT_ATTENTION,
> +			.asc = 0x3A,
> +			.ascq = SCMD_FAILURE_ASCQ_ANY,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		{
> +			.sense = NOT_READY,
> +			.asc = 0x3A,
> +			.ascq = SCMD_FAILURE_ASCQ_ANY,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		/* Retry when scsi_status_is_good would return false 3 times */
> +		{
> +			.result = SCMD_FAILURE_STAT_ANY,
> +			.allowed = 3,
> +		},
> +		{}


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries
  2023-11-16 11:02 ` [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries John Garry
@ 2023-11-16 16:29   ` Mike Christie
  0 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-16 16:29 UTC (permalink / raw)
  To: John Garry, mwilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley

On 11/16/23 5:02 AM, John Garry wrote:
> On 14/11/2023 01:37, Mike Christie wrote:
>> The following patches were made over Martin's 6.7 staging branch
>> which contains a sd change that this patch requires.
> 
> Hi Mike,
> 
> It would be nice if we could apply to 6.8 staging - was the sd.c change not in v6.7-rc1?
> 

Staging didn't exist when I posted this.

It exists now, but is missing:

https://lore.kernel.org/linux-scsi/yq1zfzn4p8a.fsf@ca-mkp.ca.oracle.com/T/#u

so will not apply to 6.8.

I was actually shooting for 6.9 so I'll figure it out by then.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 02/20] scsi: Have scsi-ml retry scsi_probe_lun errors
  2023-11-16 11:14   ` John Garry
@ 2023-11-16 16:38     ` Mike Christie
  0 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-16 16:38 UTC (permalink / raw)
  To: John Garry, mwilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley

On 11/16/23 5:14 AM, John Garry wrote:
> On 14/11/2023 01:37, Mike Christie wrote:
>>         /* Each pass gets up to three chances to ignore Unit Attention */
>> +    scsi_reset_failures(&failures);
>> +
>>       for (count = 0; count < 3; ++count) {
> 
> So do we keep this loop for the USB workaround (not shown)? I just wonder why we retry in this outer loop as well as now inside scsi_execute_cmd(). Maybe I'm missing something ....

The outer loop is for the inquiry len retries not shown. I didn't
know it was for usb. It's where we retry with a different inquiry len
to try and get more of the data. scsi_execute_cmd doesn't support that
type of thing where the data len change between retries so we still
have this loop.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 06/20] scsi: Have scsi-ml retry sd_spinup_disk errors
  2023-11-16 12:13   ` John Garry
@ 2023-11-16 16:44     ` Mike Christie
  0 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2023-11-16 16:44 UTC (permalink / raw)
  To: John Garry, mwilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley

On 11/16/23 6:13 AM, John Garry wrote:
> On 14/11/2023 01:37, Mike Christie wrote:
> 
> Hi Mike,
> 
>> This simplifies sd_spinup_disk so scsi-ml retries errors for it. Note that
>> we retried specifically on a UA and also if scsi_status_is_good returned
>> failed which would happen for all check conditions. In this patch we use
>> SCMD_FAILURE_STAT_ANY which will trigger for the same conditions as
>> when scsi_status_is_good returns false and there is status. This will
>> cover all CCs including UAs so there is no explicit failures arrary entry
> 
> /s/arrary/array/
> 
>> for UAs.
> 
> But the first failure_defs member is for UNIT_ATTENTION, below, right?
> 

I should not have written "on a UA" above. We want to retry every UA
except that first entries in that array.

This first and second entry says if we see it then fail. However, if
it's not that specific failure value, then the last entry
SCMD_FAILURE_STAT_ANY kicks in and scsi_execute_cmd will retry.

It's so for this chunk:

-			the_result = scsi_execute_cmd(sdkp->device, cmd,
-						      REQ_OP_DRV_IN, NULL, 0,
-						      SD_TIMEOUT,
-						      sdkp->max_retries,
-						      &exec_args);
 
-			if (the_result > 0) {
-				/*
-				 * If the drive has indicated to us that it
-				 * doesn't have any media in it, don't bother
-				 * with any more polling.
-				 */
-				if (media_not_present(sdkp, &sshdr)) {
-					if (media_was_present)
-						sd_printk(KERN_NOTICE, sdkp,
-							  "Media removed, stopped polling\n");
-					return;
-				}

we will not retry if we get media_not_present but then retry here:

-		} while (retries < 3 &&
-			 (!scsi_status_is_good(the_result) ||
-			  (scsi_status_is_check_condition(the_result) &&
-			  sense_valid && sshdr.sense_key == UNIT_ATTENTION)));



>> +    struct scsi_failure failure_defs[] = {
>> +        /* Do not retry Medium Not Present */
>> +        {
>> +            .sense = UNIT_ATTENTION,
>> +            .asc = 0x3A,
>> +            .ascq = SCMD_FAILURE_ASCQ_ANY,
>> +            .result = SAM_STAT_CHECK_CONDITION,
>> +        },
>> +        {
>> +            .sense = NOT_READY,
>> +            .asc = 0x3A,
>> +            .ascq = SCMD_FAILURE_ASCQ_ANY,
>> +            .result = SAM_STAT_CHECK_CONDITION,
>> +        },
>> +        /* Retry when scsi_status_is_good would return false 3 times */
>> +        {
>> +            .result = SCMD_FAILURE_STAT_ANY,
>> +            .allowed = 3,
>> +        },
>> +        {}
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 04/20] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2023-11-16 11:39   ` John Garry
@ 2023-11-16 17:15     ` Mike Christie
  2023-11-16 17:57       ` Martin Wilck
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2023-11-16 17:15 UTC (permalink / raw)
  To: John Garry, mwilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley

On 11/16/23 5:39 AM, John Garry wrote:
> On 14/11/2023 01:37, Mike Christie wrote:
> 
> Feel free to pick up:
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> 
>> +    the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
>> +                      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
> 
> BTW, some might find it a little confusing that we have scsi_execute_cmd() retries arg as well as being able to pass exec_args.failure, and exec_args.failure may allow us to retry. Could this be improved, even in terms of arg or struct member naming/comments?

Will add some comments.

Martin W, if you are going to ask about this again, the answer is that
in a future patchset, we can kill the reties passed directly into
scsi_execute_cmd.

We could make scsi_decide_disposition's failures an array of
scsi_failure structs that gets checked in scsi_decide_disposition
and scsi_check_passthrough. We would need to add a function callout
to the scsi_failure struct for some of the more complicated failure
handling. That could then be used for some of the other scsi_execute_cmd
cases as well. For the normal/fast path case though we would need
something to avoid function pointers.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v12 04/20] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2023-11-16 17:15     ` Mike Christie
@ 2023-11-16 17:57       ` Martin Wilck
  0 siblings, 0 replies; 32+ messages in thread
From: Martin Wilck @ 2023-11-16 17:57 UTC (permalink / raw)
  To: Mike Christie, John Garry, bvanassche, hch, martin.petersen,
	linux-scsi, james.bottomley

On Thu, 2023-11-16 at 11:15 -0600, Mike Christie wrote:
> On 11/16/23 5:39 AM, John Garry wrote:
> > On 14/11/2023 01:37, Mike Christie wrote:
> > 
> > Feel free to pick up:
> > Reviewed-by: John Garry <john.g.garry@oracle.com>
> > 
> > > +    the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
> > > buffer,
> > > +                      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
> > 
> > BTW, some might find it a little confusing that we have
> > scsi_execute_cmd() retries arg as well as being able to pass
> > exec_args.failure, and exec_args.failure may allow us to retry.
> > Could this be improved, even in terms of arg or struct member
> > naming/comments?
> 
> Will add some comments.
> 
> Martin W, if you are going to ask about this again, the answer is
> that
> in a future patchset, we can kill the reties passed directly into
> scsi_execute_cmd.
> 

Did I ask about this? ... I dimly remember ;-)

> We could make scsi_decide_disposition's failures an array of
> scsi_failure structs that gets checked in scsi_decide_disposition
> and scsi_check_passthrough. We would need to add a function callout
> to the scsi_failure struct for some of the more complicated failure
> handling. That could then be used for some of the other
> scsi_execute_cmd
> cases as well. For the normal/fast path case though we would need
> something to avoid function pointers.

I am not sure if I like this idea. scsi_decide_disposition() is
challenging to read already. I am not convinced that converting it into
a loop over "struct scsi_failure" would make it easier to understand.
I guess we'd need to see how it would look like. 

To my understanding, the two retry counts are currently orthogonal, the
one in scsi_execute_cmd() representing the standard mid-layer
"maybe_retry" case in scsi_decide_disposition, and the the other one
representing the additional "high level" passthrough retry. We need to
be careful when merging them.

Wrt the callout, I'd actually thought about that when looking at your
previous set, but I didn't want to interfere too much. I thought that
using callouts might simplify the scsi_failure handling, and might
actually make the new code easier to read. I can't judge possible
effects on performance.

Cheers,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-11-16 17:57 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-14  1:37 [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries Mike Christie
2023-11-14  1:37 ` [PATCH v12 01/20] scsi: Allow passthrough to request scsi-ml retries Mike Christie
2023-11-16 10:53   ` John Garry
2023-11-14  1:37 ` [PATCH v12 02/20] scsi: Have scsi-ml retry scsi_probe_lun errors Mike Christie
2023-11-16 11:14   ` John Garry
2023-11-16 16:38     ` Mike Christie
2023-11-14  1:37 ` [PATCH v12 03/20] scsi: retry INQUIRY after timeout Mike Christie
2023-11-14  1:37 ` [PATCH v12 04/20] scsi: sd: Have scsi-ml retry read_capacity_16 errors Mike Christie
2023-11-16 11:39   ` John Garry
2023-11-16 17:15     ` Mike Christie
2023-11-16 17:57       ` Martin Wilck
2023-11-14  1:37 ` [PATCH v12 05/20] scsi: Use separate buf for START_STOP in sd_spinup_disk Mike Christie
2023-11-14  1:37 ` [PATCH v12 06/20] scsi: Have scsi-ml retry sd_spinup_disk errors Mike Christie
2023-11-16 12:13   ` John Garry
2023-11-16 16:44     ` Mike Christie
2023-11-14  1:37 ` [PATCH v12 07/20] scsi: hp_sw: Have scsi-ml retry scsi_execute_cmd errors Mike Christie
2023-11-14  1:37 ` [PATCH v12 08/20] scsi: rdac: Have scsi-ml retry send_mode_select errors Mike Christie
2023-11-14  1:37 ` [PATCH v12 09/20] scsi: spi: Have scsi-ml retry spi_execute UAs Mike Christie
2023-11-14  1:37 ` [PATCH v12 10/20] scsi: sd: Have scsi-ml retry sd_sync_cache errors Mike Christie
2023-11-14  1:37 ` [PATCH v12 11/20] scsi: ch: Remove unit_attention Mike Christie
2023-11-14  1:37 ` [PATCH v12 12/20] scsi: ch: Have scsi-ml retry ch_do_scsi UAs Mike Christie
2023-11-14  1:37 ` [PATCH v12 13/20] scsi: Have scsi-ml retry scsi_mode_sense UAs Mike Christie
2023-11-14  1:37 ` [PATCH v12 14/20] scsi: Have scsi-ml retry scsi_report_lun_scan errors Mike Christie
2023-11-14  1:37 ` [PATCH v12 15/20] scsi: sd: Have pr commands retry UAs Mike Christie
2023-11-14  1:37 ` [PATCH v12 16/20] scsi: sd: Have scsi-ml retry read_capacity_10 errors Mike Christie
2023-11-14  1:37 ` [PATCH v12 17/20] scsi: ses: Have scsi-ml retry scsi_execute_cmd errors Mike Christie
2023-11-14  1:37 ` [PATCH v12 18/20] scsi: sr: Have scsi-ml retry get_sectorsize errors Mike Christie
2023-11-14  1:37 ` [PATCH v12 19/20] scsi: ufs: Have scsi-ml retry start stop errors Mike Christie
2023-11-14  1:37 ` [PATCH v12 20/20] scsi: Add kunit tests for scsi_check_passthrough Mike Christie
2023-11-14 18:38   ` Bart Van Assche
2023-11-16 11:02 ` [PATCH v12 00/20] scsi: Allow scsi_execute users to request retries John Garry
2023-11-16 16:29   ` Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox