[PATCH 0/4] scsi: passthrough fixes/improvements

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] scsi: passthrough fixes/improvements
@ 2022-08-10  3:41 Mike Christie
  2022-08-10  3:41 ` [PATCH 1/4] scsi: Fix passthrough retry counter handling Mike Christie
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Mike Christie @ 2022-08-10  3:41 UTC (permalink / raw)
  To: bvanassche, mwilck, hch, martin.petersen, linux-scsi,
	james.bottomley

The following patches were made over Linus's tree and fix a couple
bugs/issues with passthrough commands:

1. Martin is hitting a bug where DID_TIME_OUT is not retried by
scsi-ml because it hits the general passthrough check.

2. I had mentioned to Martin that I was hitting a similar bug with
transport testing. It turns out my issue was slighlty different and
was a regression where we were always setting allowed to 0 for
passthrough commands so I failed before the noretry test.

3. I was hitting an issue where pr_ops calls are returning what
some apps considered a failure when it was just a UA that was a
result of a previous pr_op.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] scsi: Fix passthrough retry counter handling
  2022-08-10  3:41 [PATCH 0/4] scsi: passthrough fixes/improvements Mike Christie
@ 2022-08-10  3:41 ` Mike Christie
  2022-08-11 12:16   ` Christoph Hellwig
  2022-08-10  3:41 ` [PATCH 2/4] scsi: Add new SUBMITTED types for passthrough Mike Christie
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2022-08-10  3:41 UTC (permalink / raw)
  To: bvanassche, mwilck, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Mike Christie

Passthrough users will set the scsi_cmnd->allowed value and were
expecting up to $allowed retries. The problem is that before:

commit 6aded12b10e0 ("scsi: core: Remove struct scsi_request")

we used to set the retries on the scsi_request then copy them over to
scsi_cmnd->allowed in scsi_setup_scsi_cmnd. With that patch we now set
scsi_cmnd->allowed to 0 in scsi_prepare_cmd then for passthrough
commands never set it again.

This patch adds a check for passthrough commands where if set then we
leave the allowed field alone since the submitter already set it.

Fixes: 6aded12b10e0 ("scsi: core: Remove struct scsi_request")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_lib.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4dbd29ab1dcc..3ef85c8b689d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1541,8 +1541,10 @@ static blk_status_t scsi_prepare_cmd(struct request *req)
 
 	scsi_init_command(sdev, cmd);
 
+	if (!blk_rq_is_passthrough(req))
+		cmd->allowed = 0;
+
 	cmd->eh_eflags = 0;
-	cmd->allowed = 0;
 	cmd->prot_type = 0;
 	cmd->prot_flags = 0;
 	cmd->submitter = 0;
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] scsi: Add new SUBMITTED types for passthrough
  2022-08-10  3:41 [PATCH 0/4] scsi: passthrough fixes/improvements Mike Christie
  2022-08-10  3:41 ` [PATCH 1/4] scsi: Fix passthrough retry counter handling Mike Christie
@ 2022-08-10  3:41 ` Mike Christie
  2022-08-11 12:21   ` Christoph Hellwig
  2022-08-10  3:41 ` [PATCH 3/4] scsi: Internally retry scsi_execute commands Mike Christie
  2022-08-10  3:41 ` [PATCH 4/4] scsi: Handle UAs for pr_ops Mike Christie
  3 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2022-08-10  3:41 UTC (permalink / raw)
  To: bvanassche, mwilck, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Mike Christie

This adds 2 new SUBMITTED types so we know if a command was queued
because of a scsi_execute user vs SG IO/tape/cd.

In the next patch we can then handle errors differently based on what
submitted the cmd. For scsi_execute we can let the scsi error handler
retry like normal if the user has requested retries. And for other users
we can fail fast for device errors like is expected by users like SG IO.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/block/pktcdvd.c            |  3 ++-
 drivers/scsi/scsi_bsg.c            |  3 ++-
 drivers/scsi/scsi_error.c          |  3 ++-
 drivers/scsi/scsi_ioctl.c          |  6 ++++--
 drivers/scsi/scsi_lib.c            | 23 ++++++++++++++++-------
 drivers/scsi/sd.c                  |  3 ++-
 drivers/scsi/sg.c                  |  3 ++-
 drivers/scsi/sr.c                  |  3 ++-
 drivers/scsi/st.c                  |  5 +++--
 drivers/target/target_core_pscsi.c |  5 +++--
 include/scsi/scsi_cmnd.h           |  5 ++++-
 11 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 4cea3b08087e..0bc69c56f36b 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -689,7 +689,8 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *
 	int ret = 0;
 
 	rq = scsi_alloc_request(q, (cgc->data_direction == CGC_DATA_WRITE) ?
-			     REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0);
+				REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0,
+				SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
 	scmd = blk_mq_rq_to_pdu(rq);
diff --git a/drivers/scsi/scsi_bsg.c b/drivers/scsi/scsi_bsg.c
index 96ee35256a16..2d4ae1383892 100644
--- a/drivers/scsi/scsi_bsg.c
+++ b/drivers/scsi/scsi_bsg.c
@@ -26,7 +26,8 @@ static int scsi_bsg_sg_io_fn(struct request_queue *q, struct sg_io_v4 *hdr,
 	}
 
 	rq = scsi_alloc_request(q, hdr->dout_xfer_len ?
-				REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0);
+				REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0,
+				SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
 	rq->timeout = timeout;
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 448748e3fba5..ac4471e33a9c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2025,7 +2025,8 @@ static void scsi_eh_lock_door(struct scsi_device *sdev)
 	struct scsi_cmnd *scmd;
 	struct request *req;
 
-	req = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN, 0);
+	req = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN, 0,
+				 SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(req))
 		return;
 	scmd = blk_mq_rq_to_pdu(req);
diff --git a/drivers/scsi/scsi_ioctl.c b/drivers/scsi/scsi_ioctl.c
index 729e309e6034..d8bb638f835d 100644
--- a/drivers/scsi/scsi_ioctl.c
+++ b/drivers/scsi/scsi_ioctl.c
@@ -435,7 +435,8 @@ static int sg_io(struct scsi_device *sdev, struct sg_io_hdr *hdr, fmode_t mode)
 		at_head = 1;
 
 	rq = scsi_alloc_request(sdev->request_queue, writing ?
-			     REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0);
+				REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0,
+				SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
 	scmd = blk_mq_rq_to_pdu(rq);
@@ -546,7 +547,8 @@ static int sg_scsi_ioctl(struct request_queue *q, fmode_t mode,
 
 	}
 
-	rq = scsi_alloc_request(q, in_len ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0);
+	rq = scsi_alloc_request(q, in_len ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0,
+				SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
 		goto error_free_buffer;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3ef85c8b689d..c689c28d6181 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -210,9 +210,10 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	int ret;
 
 	req = scsi_alloc_request(sdev->request_queue,
-			data_direction == DMA_TO_DEVICE ?
-			REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
-			rq_flags & RQF_PM ? BLK_MQ_REQ_PM : 0);
+				 data_direction == DMA_TO_DEVICE ?
+				 REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
+				 rq_flags & RQF_PM ? BLK_MQ_REQ_PM : 0,
+				 SUBMITTED_BY_SCSI_EXEC);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
@@ -226,6 +227,7 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	scmd->cmd_len = COMMAND_SIZE(cmd[0]);
 	memcpy(scmd->cmnd, cmd, scmd->cmd_len);
 	scmd->allowed = retries;
+	scmd->submitter = SUBMITTED_BY_SCSI_EXEC;
 	req->timeout = timeout;
 	req->cmd_flags |= flags;
 	req->rq_flags |= rq_flags | RQF_QUIET;
@@ -1119,13 +1121,18 @@ static void scsi_initialize_rq(struct request *rq)
 }
 
 struct request *scsi_alloc_request(struct request_queue *q, blk_opf_t opf,
-				   blk_mq_req_flags_t flags)
+				   blk_mq_req_flags_t flags,
+				   enum scsi_cmnd_submitter submitter)
 {
+	struct scsi_cmnd *cmd;
 	struct request *rq;
 
 	rq = blk_mq_alloc_request(q, opf, flags);
 	if (!IS_ERR(rq))
 		scsi_initialize_rq(rq);
+
+	cmd = blk_mq_rq_to_pdu(rq);
+	cmd->submitter = submitter;
 	return rq;
 }
 EXPORT_SYMBOL_GPL(scsi_alloc_request);
@@ -1541,13 +1548,14 @@ static blk_status_t scsi_prepare_cmd(struct request *req)
 
 	scsi_init_command(sdev, cmd);
 
-	if (!blk_rq_is_passthrough(req))
+	if (!blk_rq_is_passthrough(req)) {
 		cmd->allowed = 0;
+		cmd->submitter = SUBMITTED_BY_BLOCK_LAYER;
+	}
 
 	cmd->eh_eflags = 0;
 	cmd->prot_type = 0;
 	cmd->prot_flags = 0;
-	cmd->submitter = 0;
 	memset(&cmd->sdb, 0, sizeof(cmd->sdb));
 	cmd->underflow = 0;
 	cmd->transfersize = 0;
@@ -1605,6 +1613,8 @@ static void scsi_done_internal(struct scsi_cmnd *cmd, bool complete_directly)
 
 	switch (cmd->submitter) {
 	case SUBMITTED_BY_BLOCK_LAYER:
+	case SUBMITTED_BY_SCSI_EXEC:
+	case SUBMITTED_BY_BLOCK_PT:
 		break;
 	case SUBMITTED_BY_SCSI_ERROR_HANDLER:
 		return scsi_eh_done(cmd);
@@ -1741,7 +1751,6 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	scsi_set_resid(cmd, 0);
 	memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
-	cmd->submitter = SUBMITTED_BY_BLOCK_LAYER;
 
 	blk_mq_start_request(req);
 	reason = scsi_dispatch_cmd(cmd);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8f79fa6318fe..550df58f228a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3637,7 +3637,8 @@ static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len)
 	struct request *req;
 	struct scsi_cmnd *scmd;
 
-	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
+	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM,
+				 SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 340b050ad28d..3f10cedf6f03 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1744,7 +1744,8 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
 	 * not expect an EWOULDBLOCK from this condition.
 	 */
 	rq = scsi_alloc_request(q, hp->dxfer_direction == SG_DXFER_TO_DEV ?
-			REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0);
+				REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0,
+				SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
 	scmd = blk_mq_rq_to_pdu(rq);
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index a278b739d0c5..8b9f44079aa6 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -933,7 +933,8 @@ static int sr_read_cdda_bpc(struct cdrom_device_info *cdi, void __user *ubuf,
 	struct bio *bio;
 	int ret;
 
-	rq = scsi_alloc_request(disk->queue, REQ_OP_DRV_IN, 0);
+	rq = scsi_alloc_request(disk->queue, REQ_OP_DRV_IN, 0,
+				SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
 	scmd = blk_mq_rq_to_pdu(rq);
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 850172a2b8f1..b981a186e195 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -545,8 +545,9 @@ static int st_scsi_execute(struct st_request *SRpnt, const unsigned char *cmd,
 	struct scsi_cmnd *scmd;
 
 	req = scsi_alloc_request(SRpnt->stp->device->request_queue,
-			data_direction == DMA_TO_DEVICE ?
-			REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0);
+				 data_direction == DMA_TO_DEVICE ?
+				 REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0,
+				 SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 	scmd = blk_mq_rq_to_pdu(req);
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index e6a967ddc08c..a5886ddcf075 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -941,8 +941,9 @@ pscsi_execute_cmd(struct se_cmd *cmd)
 	sense_reason_t ret;
 
 	req = scsi_alloc_request(pdv->pdv_sd->request_queue,
-			cmd->data_direction == DMA_TO_DEVICE ?
-			REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0);
+				 cmd->data_direction == DMA_TO_DEVICE ?
+				 REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0,
+				 SUBMITTED_BY_BLOCK_PT);
 	if (IS_ERR(req))
 		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index bac55decf900..1ade0d454874 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -63,6 +63,8 @@ enum scsi_cmnd_submitter {
 	SUBMITTED_BY_BLOCK_LAYER = 0,
 	SUBMITTED_BY_SCSI_ERROR_HANDLER = 1,
 	SUBMITTED_BY_SCSI_RESET_IOCTL = 2,
+	SUBMITTED_BY_SCSI_EXEC = 3,
+	SUBMITTED_BY_BLOCK_PT = 4,
 } __packed;
 
 struct scsi_cmnd {
@@ -387,6 +389,7 @@ extern void scsi_build_sense(struct scsi_cmnd *scmd, int desc,
 			     u8 key, u8 asc, u8 ascq);
 
 struct request *scsi_alloc_request(struct request_queue *q, blk_opf_t opf,
-				   blk_mq_req_flags_t flags);
+				   blk_mq_req_flags_t flags,
+				   enum scsi_cmnd_submitter submitter);
 
 #endif /* _SCSI_SCSI_CMND_H */
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-10  3:41 [PATCH 0/4] scsi: passthrough fixes/improvements Mike Christie
  2022-08-10  3:41 ` [PATCH 1/4] scsi: Fix passthrough retry counter handling Mike Christie
  2022-08-10  3:41 ` [PATCH 2/4] scsi: Add new SUBMITTED types for passthrough Mike Christie
@ 2022-08-10  3:41 ` Mike Christie
  2022-08-10 10:46   ` Martin Wilck
  2022-08-10  3:41 ` [PATCH 4/4] scsi: Handle UAs for pr_ops Mike Christie
  3 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2022-08-10  3:41 UTC (permalink / raw)
  To: bvanassche, mwilck, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Mike Christie

In several places like LU setup and pr_ops we will hit the noretry code
path because we do not retry any passthrough commands that hit device
errors even though scsi-ml thinks the command is retryable.

This has us only fast fail commands that hit device errors that have been
submitted through the block layer directly and not via scsi_execute. This
allows SG IO and other users to continue to get fast failures and all
device errors returned to them, and scsi_execute users will get their
retries they had requested.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_error.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ac4471e33a9c..573d926220c4 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1806,7 +1806,8 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd)
 	 * assume caller has checked sense and determined
 	 * the check condition was retryable.
 	 */
-	if (req->cmd_flags & REQ_FAILFAST_DEV || blk_rq_is_passthrough(req))
+	if (req->cmd_flags & REQ_FAILFAST_DEV ||
+	    scmd->submitter == SUBMITTED_BY_BLOCK_PT)
 		return true;
 
 	return false;
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] scsi: Handle UAs for pr_ops
  2022-08-10  3:41 [PATCH 0/4] scsi: passthrough fixes/improvements Mike Christie
                   ` (2 preceding siblings ...)
  2022-08-10  3:41 ` [PATCH 3/4] scsi: Internally retry scsi_execute commands Mike Christie
@ 2022-08-10  3:41 ` Mike Christie
  3 siblings, 0 replies; 17+ messages in thread
From: Mike Christie @ 2022-08-10  3:41 UTC (permalink / raw)
  To: bvanassche, mwilck, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Mike Christie

There's several UAs for PGRs that will occur when releasing/preempting
reservations. If we get these with the pr_ops right now, we just log a
failure and return. For dm-multipath this is a waste because it will
end up trying a new path since it doesn't know what a UA is. For
userspace it's also a pain, because they end up just retrying. This
patch adds the PGR related UAs so scsi-ml just retries for us.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_error.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 573d926220c4..59528138328d 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -634,6 +634,14 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		if (scmd->device->allow_restart &&
 		    (sshdr.asc == 0x04) && (sshdr.ascq == 0x02))
 			return FAILED;
+		/*
+		 * Registration/reservations preempted or reservation
+		 * released.
+		 */
+		if (sshdr.asc == 0x2a &&
+		    (sshdr.ascq == 0x03 || sshdr.ascq == 0x04 ||
+		     sshdr.ascq == 0x05))
+			return NEEDS_RETRY;
 		/*
 		 * Pass the UA upwards for a determination in the completion
 		 * functions.
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-10  3:41 ` [PATCH 3/4] scsi: Internally retry scsi_execute commands Mike Christie
@ 2022-08-10 10:46   ` Martin Wilck
  2022-08-10 17:06     ` Mike Christie
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Wilck @ 2022-08-10 10:46 UTC (permalink / raw)
  To: Mike Christie, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Hannes Reinecke

Hi Mike,

On Tue, 2022-08-09 at 22:41 -0500, Mike Christie wrote:
> In several places like LU setup and pr_ops we will hit the noretry
> code
> path because we do not retry any passthrough commands that hit device
> errors even though scsi-ml thinks the command is retryable.
> 
> This has us only fast fail commands that hit device errors that have
> been
> submitted through the block layer directly and not via scsi_execute.
> This
> allows SG IO and other users to continue to get fast failures and all
> device errors returned to them, and scsi_execute users will get their
> retries they had requested.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Thanks a lot. I like the general approach, but I am concerned that by
treating every command sent by scsi_execute() or scsi_execute_req() as
a retryable command, we may break some callers, or at least modify
their semantics in unexpected ways. For example, spi_execute(),
scsi_probe_lun(), scsi_report_lun_scan() currently retry only on UA.
With this change, these commands will be retried in additional cases,
without the caller noticing (see 949bf797595fc ("[SCSI] fix command
retries in spi_transport class"). It isn't obvious to me that this is
correct in all affected cases. 

Note that the retry logic of the mid level may change depending on the
installed device handler. For example, ALUA devices will endlessly
retry UA with ASC=29, which some callers explicitly look for.

I believe we need to review all callers that have their own retry loop
and /or error handling logic. This includes sd_spinup_disk(),
sd_sync_cache(), scsi_test_unit_ready(). SCSI device handlers treat
some sense keys in special ways, or may retry commands on another
member of a port group (see alua_rtpg()).

DID_TIME_OUT is a general concern - no current caller of scsi_execute()
expects timed-out commands to be retried, and doing so has the
potential of slowing down operations a lot. I am aware that my recent
patch changed exactly this for scsi_probe_lun(), but doing the same
thing for every scsi_execute() invocation sounds at least somewhat
dangerous.

Bottom line: I wonder if SUBMITTED_BY_SCSI_EXEC is suffient to
determine that a command should be retried like an ordinary block level
command.

Regards,
Martin

> ---
>  drivers/scsi/scsi_error.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index ac4471e33a9c..573d926220c4 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -1806,7 +1806,8 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd)
>          * assume caller has checked sense and determined
>          * the check condition was retryable.
>          */
> -       if (req->cmd_flags & REQ_FAILFAST_DEV ||
> blk_rq_is_passthrough(req))
> +       if (req->cmd_flags & REQ_FAILFAST_DEV ||
> +           scmd->submitter == SUBMITTED_BY_BLOCK_PT)
>                 return true;
>  
>         return false;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-10 10:46   ` Martin Wilck
@ 2022-08-10 17:06     ` Mike Christie
  2022-08-10 17:38       ` Mike Christie
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Mike Christie @ 2022-08-10 17:06 UTC (permalink / raw)
  To: Martin Wilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Hannes Reinecke

On 8/10/22 5:46 AM, Martin Wilck wrote:
> Hi Mike,
> 
> On Tue, 2022-08-09 at 22:41 -0500, Mike Christie wrote:
>> In several places like LU setup and pr_ops we will hit the noretry
>> code
>> path because we do not retry any passthrough commands that hit device
>> errors even though scsi-ml thinks the command is retryable.
>>
>> This has us only fast fail commands that hit device errors that have
>> been
>> submitted through the block layer directly and not via scsi_execute.
>> This
>> allows SG IO and other users to continue to get fast failures and all
>> device errors returned to them, and scsi_execute users will get their
>> retries they had requested.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> Thanks a lot. I like the general approach, but I am concerned that by
> treating every command sent by scsi_execute() or scsi_execute_req() as
> a retryable command, we may break some callers, or at least modify
> their semantics in unexpected ways. For example, spi_execute(),
> scsi_probe_lun(), scsi_report_lun_scan() currently retry only on UA.
> With this change, these commands will be retried in additional cases,
> without the caller noticing (see 949bf797595fc ("[SCSI] fix command
> retries in spi_transport class"). It isn't obvious to me that this is
> correct in all affected cases. 
Let's make sure we are on the same page, because I might agree with you.
But for possible solutions we need to agree what this patch actually changes.

We currently have 3 places we get retries from:

1. scsi_decide_disposition - For passthrough commands the patch only changes
the behavior when scsi_decide_disposition gets NEED_RETRY, retries < allowed,
and REQ_FAILFAST_DEV is not set.

It's really specific and not as general as I think you thought it was.

2. scsi_io_completion - Passthrough commands are never retried here.

3. scsi_execute users driving retries.

For your examples above:

- The scan/probe functions ask for 3 retries and so with patch1 we will now
get 3 x 3 retries for errors that hit #1.

So I agree this is really wrong for DID_TIME_OUT.

- There is no behavior change for spi because it uses REQ_FAILFAST_DEV.

> 
> Note that the retry logic of the mid level may change depending on the
> installed device handler. For example, ALUA devices will endlessly
> retry UA with ASC=29, which some callers explicitly look for.

There is no behavior change with my patch for ASC=29 case, because it
uses ADD_TO_MLQUEUE and we don't run scsi_noretry_cmd for that error.

It could change how 0x04/0x0a is handled because it uses NEEDS_RETRY.
However, scsi_dh_alua uses REQ_FAILFAST_DEV so we do not retry in
scsi_noretry_cmd like before.

> 
> I believe we need to review all callers that have their own retry loop
> and /or error handling logic. This includes sd_spinup_disk(),
> sd_sync_cache(), scsi_test_unit_ready(). SCSI device handlers treat
> some sense keys in special ways, or may retry commands on another
> member of a port group (see alua_rtpg()).

There is no change in behavior for the alua one but agree with the general
comment.

> 
> DID_TIME_OUT is a general concern - no current caller of scsi_execute()
> expects timed-out commands to be retried, and doing so has the
> potential of slowing down operations a lot. I am aware that my recent
> patch changed exactly this for scsi_probe_lun(), but doing the same
> thing for every scsi_execute() invocation sounds at least somewhat
> dangerous.

Agree this patch is wrong:
- With patch1 that fixes scsi_cmnd->allowed we can end up with N and M
DID_TIME_OUT retries and that can get crazy. So agree with that.

- For the general idea of do we always want to retry DID_TIME_OUT, I can
I also see your point.

- After reading your mail and thinking about patch 4, I was thinking that
this is wrong for patch 4 as well. For the pr_ops case we want the opposite
of what you were mentioning in here. I actually want scsi-ml to retry
all UAs for me or allow me to retry all UAs and not just handle the specific
PGR related ones.

I'm not sure where to go from here:

1. Just have the callers drive extra retries like we do now.

I guess I've come around and are ok with this.

With patch1, scsi_cmnd->allowed and scsi_execute works for me, so my
previous comment about scsi_probe_lun needing to retry that case does
not apply since scsi-ml will do it for me.

I do think we need to retry your case in other places though.

2. Instead of trying to make it general for all scsi_execute_users, we can
add SCMD bits for specific cases like DID_TIME_OUT or a SCMD bit that tells
scsi_noretry_cmd to not always fail passthrough commands just because they
are passthrough. It would work the opposite of the FASTFAIL bits where instead
of failing fast, we retry.

I think because the cases scsi_noretry_cmd is used for are really specific cases
(scsi_decide_disposition sees NEEDS_RETRY, retries < allowed, and REQ_FAILFAST_DEV
is not set) that might not be very useful. I'm not sure we want to add a bunch of
cases specific to scsi_execute callers to scsi_check_sense? I don't think we do.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-10 17:06     ` Mike Christie
@ 2022-08-10 17:38       ` Mike Christie
  2022-08-11 12:28         ` Christoph Hellwig
  2022-08-11  9:56       ` Martin Wilck
  2022-08-11 12:27       ` Christoph Hellwig
  2 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2022-08-10 17:38 UTC (permalink / raw)
  To: Martin Wilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Hannes Reinecke

On 8/10/22 12:06 PM, Mike Christie wrote:
> I think because the cases scsi_noretry_cmd is used for are really specific cases
> (scsi_decide_disposition sees NEEDS_RETRY, retries < allowed, and REQ_FAILFAST_DEV
> is not set) that might not be very useful. I'm not sure we want to add a bunch of
> cases specific to scsi_execute callers to scsi_check_sense? I don't think we do.

To be clear, I mean for this approach to be generally useful we would have to
add the cases scsi_execute callers are retrying in scsi_check_sense. We could
then remove the scsi_execute caller driven retries and only have scsi_decide_disposition
retry.

It sounded really nice at first, but to do that I would have a bunch of cases
that might be specific to pr_ops, alua or scanning, etc. Then I also had cases where
user1 does not want to retry but user2 did, so I have to add more SCMD bits. So, in
the end it will get messier than you initially think.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-10 17:06     ` Mike Christie
  2022-08-10 17:38       ` Mike Christie
@ 2022-08-11  9:56       ` Martin Wilck
  2022-08-11 16:15         ` Mike Christie
  2022-08-11 12:27       ` Christoph Hellwig
  2 siblings, 1 reply; 17+ messages in thread
From: Martin Wilck @ 2022-08-11  9:56 UTC (permalink / raw)
  To: Mike Christie, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Hannes Reinecke

On Wed, 2022-08-10 at 12:06 -0500, Mike Christie wrote:
> On 8/10/22 5:46 AM, Martin Wilck wrote:
> > Hi Mike,
> > 
> > On Tue, 2022-08-09 at 22:41 -0500, Mike Christie wrote:
> > > In several places like LU setup and pr_ops we will hit the
> > > noretry
> > > code
> > > path because we do not retry any passthrough commands that hit
> > > device
> > > errors even though scsi-ml thinks the command is retryable.
> > > 
> > > This has us only fast fail commands that hit device errors that
> > > have
> > > been
> > > submitted through the block layer directly and not via
> > > scsi_execute.
> > > This
> > > allows SG IO and other users to continue to get fast failures and
> > > all
> > > device errors returned to them, and scsi_execute users will get
> > > their
> > > retries they had requested.
> > > 
> > > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> > 
> > Thanks a lot. I like the general approach, but I am concerned that
> > by
> > treating every command sent by scsi_execute() or scsi_execute_req()
> > as
> > a retryable command, we may break some callers, or at least modify
> > their semantics in unexpected ways. For example, spi_execute(),
> > scsi_probe_lun(), scsi_report_lun_scan() currently retry only on
> > UA.
> > With this change, these commands will be retried in additional
> > cases,
> > without the caller noticing (see 949bf797595fc ("[SCSI] fix command
> > retries in spi_transport class"). It isn't obvious to me that this
> > is
> > correct in all affected cases. 
> Let's make sure we are on the same page, because I might agree with
> you.
> But for possible solutions we need to agree what this patch actually
> changes.
> 
> We currently have 3 places we get retries from:
> 
> 1. scsi_decide_disposition - For passthrough commands the patch only
> changes
> the behavior when scsi_decide_disposition gets NEED_RETRY, retries <
> allowed,
> and REQ_FAILFAST_DEV is not set.
> 
> It's really specific and not as general as I think you thought it
> was.

I was aware of it, but I confused matters for some cases :-/

> 2. scsi_io_completion - Passthrough commands are never retried here.
> 
> 3. scsi_execute users driving retries.

There's also the error handler retrying commands e.g. after a
successful abort. This is the DID_TIME_OUT case, actually -
scsi_decide_disposition() never calls scsi_noretry_cmd() for
DID_TIME_OUT.

> For your examples above:
>  
> - The scan/probe functions ask for 3 retries and so with patch1 we
> will now
> get 3 x 3 retries for errors that hit #1.
> 
> So I agree this is really wrong for DID_TIME_OUT.
> 
> - There is no behavior change for spi because it uses
> REQ_FAILFAST_DEV.

Overlooked that, thanks.

> > 
> > Note that the retry logic of the mid level may change depending on
> > the
> > installed device handler. For example, ALUA devices will endlessly
> > retry UA with ASC=29, which some callers explicitly look for.
> 
> There is no behavior change with my patch for ASC=29 case, because it
> uses ADD_TO_MLQUEUE and we don't run scsi_noretry_cmd for that error.

Right, thanks for pointing that out. I saw it but didn't realize what
it meant.

> It could change how 0x04/0x0a is handled because it uses NEEDS_RETRY.
> However, scsi_dh_alua uses REQ_FAILFAST_DEV so we do not retry in
> scsi_noretry_cmd like before.

Not quite following you here - alua_check_sense() is called for any
command, not just those submitted from the ALUA code.
> 

> > I believe we need to review all callers that have their own retry
> > loop
> > and /or error handling logic. This includes sd_spinup_disk(),
> > sd_sync_cache(), scsi_test_unit_ready(). SCSI device handlers treat
> > some sense keys in special ways, or may retry commands on another
> > member of a port group (see alua_rtpg()).
> 
> There is no change in behavior for the alua one but agree with the
> general
> comment.
> 
> > 
> > DID_TIME_OUT is a general concern - no current caller of
> > scsi_execute()
> > expects timed-out commands to be retried, and doing so has the
> > potential of slowing down operations a lot. I am aware that my
> > recent
> > patch changed exactly this for scsi_probe_lun(), but doing the same
> > thing for every scsi_execute() invocation sounds at least somewhat
> > dangerous.
> 
> Agree this patch is wrong:
> - With patch1 that fixes scsi_cmnd->allowed we can end up with N and
> M
> DID_TIME_OUT retries and that can get crazy. So agree with that.
> 
> - For the general idea of do we always want to retry DID_TIME_OUT, I
> can
> I also see your point.
> 
> - After reading your mail and thinking about patch 4, I was thinking
> that
> this is wrong for patch 4 as well. For the pr_ops case we want the
> opposite
> of what you were mentioning in here. I actually want scsi-ml to retry
> all UAs for me or allow me to retry all UAs and not just handle the
> specific
> PGR related ones.
> 
> I'm not sure where to go from here:
> 
> 1. Just have the callers drive extra retries like we do now.
> 
> I guess I've come around and are ok with this.

Me too, but I'm not sure about Christoph.

> 
> With patch1, scsi_cmnd->allowed and scsi_execute works for me, so my
> previous comment about scsi_probe_lun needing to retry that case does
> not apply since scsi-ml will do it for me.
> 
> I do think we need to retry your case in other places though.
> 
> 2. Instead of trying to make it general for all scsi_execute_users,
> we can
> add SCMD bits for specific cases like DID_TIME_OUT or a SCMD bit that
> tells
> scsi_noretry_cmd to not always fail passthrough commands just because
> they
> are passthrough. It would work the opposite of the FASTFAIL bits
> where instead
> of failing fast, we retry.
> 
> I think because the cases scsi_noretry_cmd is used for are really
> specific cases
> (scsi_decide_disposition sees NEEDS_RETRY, retries < allowed, and
> REQ_FAILFAST_DEV
> is not set) that might not be very useful. 

I don't think it's _that_ speficic. (retries < allowed) is the default
case, at least for the first failure. REQ_FAILFAST_DEV has very few
users except for the device handlers, and NEEDS_RETRY is a rather
frequently used disposition.

> I'm not sure we want to add a bunch of
> cases specific to scsi_execute callers to scsi_check_sense? I don't
> think we do.

If we don't want to fall back to 1. above, I think we can go a long way
with two additional flags for scsi_execute() callers:

 a) indicate whether DID_TIME_OUT should be retried by the error
handler (default: no),
 b) indicate whether UA (and NOT READY?) should be retried on the ML
(default: current behavior)

The rationale for b) is that (IIRC from yesterday's code inspection all
callers that check sense codes only define special cases for UA, or
some subcases of it.

Maybe we need also

 c) indicate whether the (don't) retry logic for SG_IO should be used.

> To be clear, I mean for this approach to be generally useful we would
> have to
> add the cases scsi_execute callers are retrying in scsi_check_sense.
> We could
> then remove the scsi_execute caller driven retries and only have
> scsi_decide_disposition
> retry.
> 
> It sounded really nice at first, but to do that I would have a bunch
> of cases
> that might be specific to pr_ops, alua or scanning, etc. Then I also
> had cases where
> user1 does not want to retry but user2 did, so I have to add more
> SCMD bits. So, in
> the end it will get messier than you initially think.
> 

I agree. I thought about passing a lookup table of sense keys (not) to
be retried to the ML, but it feels over-engineered to me. 

Thanks
Martin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] scsi: Fix passthrough retry counter handling
  2022-08-10  3:41 ` [PATCH 1/4] scsi: Fix passthrough retry counter handling Mike Christie
@ 2022-08-11 12:16   ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-11 12:16 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, mwilck, hch, martin.petersen, linux-scsi,
	james.bottomley

On Tue, Aug 09, 2022 at 10:41:52PM -0500, Mike Christie wrote:
>  	scsi_init_command(sdev, cmd);
>  
> +	if (!blk_rq_is_passthrough(req))
> +		cmd->allowed = 0;
> +
>  	cmd->eh_eflags = 0;
> -	cmd->allowed = 0;

While this is correct, I think it makes the function read rather odd.

I'd move it down after the:

	if (blk_rq_is_passthrough(req))
		return scsi_setup_scsi_cmnd(sdev, req);

and maybe add a comment;

	/* usually overriden by the ULP */
	cmd->allowed = 0;


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] scsi: Add new SUBMITTED types for passthrough
  2022-08-10  3:41 ` [PATCH 2/4] scsi: Add new SUBMITTED types for passthrough Mike Christie
@ 2022-08-11 12:21   ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-11 12:21 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, mwilck, hch, martin.petersen, linux-scsi,
	james.bottomley

On Tue, Aug 09, 2022 at 10:41:53PM -0500, Mike Christie wrote:
> This adds 2 new SUBMITTED types so we know if a command was queued
> because of a scsi_execute user vs SG IO/tape/cd.
> 
> In the next patch we can then handle errors differently based on what
> submitted the cmd. For scsi_execute we can let the scsi error handler
> retry like normal if the user has requested retries. And for other users
> we can fail fast for device errors like is expected by users like SG IO.

But do we really want to handle errors differently based on the
submitter, or based on what the submitter asks us to do?  I.e. why
should this be based on the callchain vs the intention?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-10 17:06     ` Mike Christie
  2022-08-10 17:38       ` Mike Christie
  2022-08-11  9:56       ` Martin Wilck
@ 2022-08-11 12:27       ` Christoph Hellwig
  2 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-11 12:27 UTC (permalink / raw)
  To: Mike Christie
  Cc: Martin Wilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley, Hannes Reinecke

On Wed, Aug 10, 2022 at 12:06:41PM -0500, Mike Christie wrote:
> 2. Instead of trying to make it general for all scsi_execute_users, we can
> add SCMD bits for specific cases like DID_TIME_OUT or a SCMD bit that tells
> scsi_noretry_cmd to not always fail passthrough commands just because they
> are passthrough. It would work the opposite of the FASTFAIL bits where instead
> of failing fast, we retry.

Yes, I think this is closer to what I'd like to see.   Although I wonder
if we should turn it around and require the FAILFAST bits to opt out of
automatic retries even for passthrough, even if that turns into a major
audit.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-10 17:38       ` Mike Christie
@ 2022-08-11 12:28         ` Christoph Hellwig
  2022-08-11 16:19           ` Mike Christie
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-11 12:28 UTC (permalink / raw)
  To: Mike Christie
  Cc: Martin Wilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley, Hannes Reinecke

On Wed, Aug 10, 2022 at 12:38:08PM -0500, Mike Christie wrote:
> It sounded really nice at first, but to do that I would have a bunch of
> cases that might be specific to pr_ops, alua or scanning, etc. Then I
> also had cases where user1 does not want to retry but user2 did, so I
> have to add more SCMD bits. So, in the end it will get messier than
> you initially think.

Do you have an overview of the different cases you've spot so far?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-11  9:56       ` Martin Wilck
@ 2022-08-11 16:15         ` Mike Christie
  2022-08-11 17:02           ` Martin Wilck
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2022-08-11 16:15 UTC (permalink / raw)
  To: Martin Wilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Hannes Reinecke

On 8/11/22 4:56 AM, Martin Wilck wrote:
>> It could change how 0x04/0x0a is handled because it uses NEEDS_RETRY.
>> However, scsi_dh_alua uses REQ_FAILFAST_DEV so we do not retry in
>> scsi_noretry_cmd like before.
> 
> Not quite following you here - alua_check_sense() is called for any
> command, not just those submitted from the ALUA code.

Ah, I thought you had mentioned alua above because of your alua_rtpg
example. Above I was saying that there was no behavior change for your
alua_rtpg example because it uses REQ_FAILFAST_DEV.


>> 2. Instead of trying to make it general for all scsi_execute_users,
>> we can
>> add SCMD bits for specific cases like DID_TIME_OUT or a SCMD bit that
>> tells
>> scsi_noretry_cmd to not always fail passthrough commands just because
>> they
>> are passthrough. It would work the opposite of the FASTFAIL bits
>> where instead
>> of failing fast, we retry.
>>
>> I think because the cases scsi_noretry_cmd is used for are really
>> specific cases
>> (scsi_decide_disposition sees NEEDS_RETRY, retries < allowed, and
>> REQ_FAILFAST_DEV
>> is not set) that might not be very useful. 
> 
> I don't think it's _that_ speficic. (retries < allowed) is the default
> case, at least for the first failure. REQ_FAILFAST_DEV has very few
> users except for the device handlers, and NEEDS_RETRY is a rather
> frequently used disposition.
I'm saying it's really specific because we only hit this code
path that is causing issues when scsi_check_sense returns NEEDS_RETRY.
There's 5 in there and one in scsi_dh_alua. 4 of them are UAs.

Compared to all the sense errors that we check for in the
scsi_execute callers and including all the times they do a retry for
all errors the 5 cases in scsi_check_sense seemed really specific.

Let me send a patch for this type of design because in the other mail
Christoph was asking for more details. I originally started going that
route so it won't be too much trouble to do a RFC so we can get an
idea of what it will look like.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-11 12:28         ` Christoph Hellwig
@ 2022-08-11 16:19           ` Mike Christie
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Christie @ 2022-08-11 16:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin Wilck, bvanassche, martin.petersen, linux-scsi,
	james.bottomley, Hannes Reinecke

On 8/11/22 7:28 AM, Christoph Hellwig wrote:
> On Wed, Aug 10, 2022 at 12:38:08PM -0500, Mike Christie wrote:
>> It sounded really nice at first, but to do that I would have a bunch of
>> cases that might be specific to pr_ops, alua or scanning, etc. Then I
>> also had cases where user1 does not want to retry but user2 did, so I
>> have to add more SCMD bits. So, in the end it will get messier than
>> you initially think.
> 
> Do you have an overview of the different cases you've spot so far?

I'll finish up what I was working on for this type of design and I'll
post it as a rough RFC, so it should be easier to discuss in code format.

I'm going to break out the first patch, handle your comment on it, and
send that separately since it's a regression fix and not related to this
discussion.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-11 16:15         ` Mike Christie
@ 2022-08-11 17:02           ` Martin Wilck
  2022-08-11 18:22             ` Mike Christie
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Wilck @ 2022-08-11 17:02 UTC (permalink / raw)
  To: Mike Christie, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Hannes Reinecke

On Thu, 2022-08-11 at 11:15 -0500, Mike Christie wrote:
> > 
> > I don't think it's _that_ speficic. (retries < allowed) is the
> > default
> > case, at least for the first failure. REQ_FAILFAST_DEV has very few
> > users except for the device handlers, and NEEDS_RETRY is a rather
> > frequently used disposition.
> I'm saying it's really specific because we only hit this code
> path that is causing issues when scsi_check_sense returns
> NEEDS_RETRY.

What about the other cases in scsi_decide_disposition() that jump to
maybe_retry?


> There's 5 in there and one in scsi_dh_alua. 4 of them are UAs.
> 
> Compared to all the sense errors that we check for in the
> scsi_execute callers and including all the times they do a retry for
> all errors the 5 cases in scsi_check_sense seemed really specific.
> 
> Let me send a patch for this type of design because in the other mail
> Christoph was asking for more details. I originally started going
> that
> route so it won't be too much trouble to do a RFC so we can get an
> idea of what it will look like.

Looking forward to it.

Martin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] scsi: Internally retry scsi_execute commands
  2022-08-11 17:02           ` Martin Wilck
@ 2022-08-11 18:22             ` Mike Christie
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Christie @ 2022-08-11 18:22 UTC (permalink / raw)
  To: Martin Wilck, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley
  Cc: Hannes Reinecke

On 8/11/22 12:02 PM, Martin Wilck wrote:
> On Thu, 2022-08-11 at 11:15 -0500, Mike Christie wrote:
>>>
>>> I don't think it's _that_ speficic. (retries < allowed) is the
>>> default
>>> case, at least for the first failure. REQ_FAILFAST_DEV has very few
>>> users except for the device handlers, and NEEDS_RETRY is a rather
>>> frequently used disposition.
>> I'm saying it's really specific because we only hit this code
>> path that is causing issues when scsi_check_sense returns
>> NEEDS_RETRY.
> 
> What about the other cases in scsi_decide_disposition() that jump to
> maybe_retry?

Ok one exception to what I wrote. DID_TIMEOUT hits the
blk_rq_is_passthrough test of course :)

The rest hit that check condition check and are retried like normal IO:

scsi_noretry_cmd()
.....

	switch (host_byte(scmd->result)) {
...

        if (!scsi_status_is_check_condition(scmd->result))
                return false;

We get to that blk_rq_is_passthrough for DID_TIME_OUT because it
has a goto to bypass the above test. The other host errors we return
false above and retry.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-08-11 18:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-10  3:41 [PATCH 0/4] scsi: passthrough fixes/improvements Mike Christie
2022-08-10  3:41 ` [PATCH 1/4] scsi: Fix passthrough retry counter handling Mike Christie
2022-08-11 12:16   ` Christoph Hellwig
2022-08-10  3:41 ` [PATCH 2/4] scsi: Add new SUBMITTED types for passthrough Mike Christie
2022-08-11 12:21   ` Christoph Hellwig
2022-08-10  3:41 ` [PATCH 3/4] scsi: Internally retry scsi_execute commands Mike Christie
2022-08-10 10:46   ` Martin Wilck
2022-08-10 17:06     ` Mike Christie
2022-08-10 17:38       ` Mike Christie
2022-08-11 12:28         ` Christoph Hellwig
2022-08-11 16:19           ` Mike Christie
2022-08-11  9:56       ` Martin Wilck
2022-08-11 16:15         ` Mike Christie
2022-08-11 17:02           ` Martin Wilck
2022-08-11 18:22             ` Mike Christie
2022-08-11 12:27       ` Christoph Hellwig
2022-08-10  3:41 ` [PATCH 4/4] scsi: Handle UAs for pr_ops Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox