linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data
@ 2025-10-02 19:25 Ewan D. Milne
  2025-10-02 19:25 ` [PATCH v4 1/9] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
                   ` (8 more replies)
  0 siblings, 9 replies; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

We encountered a SCSI device that responded to the initial READ CAPACITY command
with a good status, but no data was transferred.  This caused a sudden change of
the device capacity to zero when the device was rescanned, for no obvious reason.

This patch series changes read_capacity_10() and read_capacity_16() in sd.c
to retry the command up to 3 times in an attempt to get valid capacity information.
A message is logged if this is ultimately unsuccessful.

There are some predecessor patches, one from a patch in a series by Mike Christie
which changes read_capacity_16() to use the scsi_failures mechanism (which did
not eventually get merged), this makes the changes here much more similar for
both the read_capacity_10 and read_capacity_16() case.  Another patch corrects
a potential use of an uninitialized variable, and a third one removes a check
for -EOVERFLOW that hasn't been needed since commit 72deb455b5ec
("block: remove CONFIG_LBDAF").  Other patches fill in missing .ascq entries
in the scsi_failures array and address other review comments.

The final patches to scsi_debug allow insertion of the fault to test this change.

Changes in v4:
  - Add "only_once" module option to scsi_debug to allow testing of retry logic
  - Reset sdeb_inject_pending after OPT_NO_DATA fault is injected
  - Changed one patch subject message to be less verbose
  - Clarlified one commit message
  - Simplified a conditional in 2 places in one patch, and 1 in another

Changes in v3:
  - Removed patch to pass the length of the buffer through the sd_read_capacity()
    call chain and adjusted other patches accordingly.  Use RC10/16_LEN for memset()
  - Removed supurfluous parenthesis in conditionals

Changes in v2:
  - Added patches to explicitly specify .ascq in scsi_features usage
  - Pass the length of the buffer used through the sd_read_capacity() call chain
  - Simplify a conditional in scsi_probe_lun() that was requested in similar
    code in read_capacity_16()/read_capacity_10()
  - Changed code in scsi_debug() to make only one call to scsi_set_resid()
  - Moved some declarations around in read_capacity_16()/read_capacity_10()
    and memset() the whole buffer instead of the expected data size
  - Add the newly added flag SDEBUG_NO_DATA to SDEBUG_OPT_ALL_INJECTING

Ewan D. Milne (9):
  scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures
  scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ
  scsi: sd: Have scsi-ml retry read_capacity_16 errors
  scsi: sd: Avoid passing potentially uninitialized "sense_valid" to
    read_capacity_error()
  scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16)
    returning no data
  scsi: Simplify nested if conditional in scsi_probe_lun()
  scsi: scsi_debug: Add option to suppress returned data but return good
    status
  scsi: scsi_debug: Add "only_once" module option to inject an error one
    time

 drivers/scsi/scsi_debug.c |  99 +++++++++++++++++-----
 drivers/scsi/scsi_scan.c  |  20 ++---
 drivers/scsi/sd.c         | 167 ++++++++++++++++++++++++++++----------
 3 files changed, 212 insertions(+), 74 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 1/9] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures
  2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
@ 2025-10-02 19:25 ` Ewan D. Milne
  2025-10-03  4:24   ` Damien Le Moal
  2025-10-06  6:59   ` Hannes Reinecke
  2025-10-02 19:25 ` [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ Ewan D. Milne
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

This does not change any behavior (since .ascq was initialized to 0 by
the compiler) but makes explicit that the entry in the scsi_failures
array does not handle cases where ASCQ is nonzero, consistent with other
usage.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/scsi_scan.c | 2 ++
 drivers/scsi/sd.c        | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3c6e089e80c3..c754b1d566e0 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -660,11 +660,13 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 		{
 			.sense = UNIT_ATTENTION,
 			.asc = 0x28,
+			.ascq = 0x00,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
 		{
 			.sense = UNIT_ATTENTION,
 			.asc = 0x29,
+			.ascq = 0x00,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
 		{
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 5b8668accf8e..78f5903cc8d0 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2740,6 +2740,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		{
 			.sense = UNIT_ATTENTION,
 			.asc = 0x29,
+			.ascq = 0x00,
 			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ
  2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
  2025-10-02 19:25 ` [PATCH v4 1/9] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
@ 2025-10-02 19:25 ` Ewan D. Milne
  2025-10-03  4:24   ` Damien Le Moal
  2025-10-02 19:25 ` [PATCH v4 3/9] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

This makes the handling in read_capacity_10() consistent with other
cases, e.g. sd_spinup_disk().  Omitting .ascq in scsi_failure did not
result in wildcard matching, it only handled ASCQ 0x00.  This patch
changes the retry behavior, we no longer retry 3 times on ASC 0x3a
if a nonzero ASCQ is ever returned.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 78f5903cc8d0..e3b802b26f0e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2729,11 +2729,13 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		{
 			.sense = UNIT_ATTENTION,
 			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
 		{
 			.sense = NOT_READY,
 			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
 		 /* Device reset might occur several times so retry a lot */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 3/9] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
  2025-10-02 19:25 ` [PATCH v4 1/9] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
  2025-10-02 19:25 ` [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ Ewan D. Milne
@ 2025-10-02 19:25 ` Ewan D. Milne
  2025-10-06  2:00   ` Damien Le Moal
  2025-10-06  7:01   ` Hannes Reinecke
  2025-10-02 19:25 ` [PATCH v4 4/9] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

This has read_capacity_16 have scsi-ml retry errors instead of driving
them itself.

There are 2 behavior changes with this patch:
1. There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs since the block layer waits/retries for us. For possible
memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
retrying will probably not help.
2. For the specific UAs we checked for and retried, we would get
READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
from the main loop's retries. Each UA now gets
READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
is already 10 and is not based on anything specific like a spec or
device, so the extra 3 we got from the main loop was probably just an
accident and is not going to help.

Original patch by Mike Christie <michael.christie@oracle.com> modified
based upon review comments for an earlier version of this patch.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 107 +++++++++++++++++++++++++++++++---------------
 1 file changed, 73 insertions(+), 34 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e3b802b26f0e..25561d01f972 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2631,14 +2631,66 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
 static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		struct queue_limits *lim, unsigned char *buffer)
 {
-	unsigned char cmd[16];
+	static const u8 cmd[16] = {
+		[0] = SERVICE_ACTION_IN_16,
+		[1] = SAI_READ_CAPACITY_16,
+		[13] = RC16_LEN,
+	};
 	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		/*
+		 * Do not retry Invalid Command Operation Code or Invalid
+		 * Field in CDB.
+		 */
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x20,
+			.ascq = 0x00,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x24,
+			.ascq = 0x00,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Do not retry Medium Not Present */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Device reset might occur several times so retry a lot */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.ascq = 0x00,
+			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Any other error not listed above retry 3 times */
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+			.allowed = 3,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 	int sense_valid = 0;
 	int the_result;
-	int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
 	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
@@ -2646,40 +2698,27 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	if (sdp->no_read_capacity_16)
 		return -EINVAL;
 
-	do {
-		memset(cmd, 0, 16);
-		cmd[0] = SERVICE_ACTION_IN_16;
-		cmd[1] = SAI_READ_CAPACITY_16;
-		cmd[13] = RC16_LEN;
-		memset(buffer, 0, RC16_LEN);
-
-		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
-					      buffer, RC16_LEN, SD_TIMEOUT,
-					      sdkp->max_retries, &exec_args);
-		if (the_result > 0) {
-			if (media_not_present(sdkp, &sshdr))
-				return -ENODEV;
+	memset(buffer, 0, RC16_LEN);
 
-			sense_valid = scsi_sense_valid(&sshdr);
-			if (sense_valid &&
-			    sshdr.sense_key == ILLEGAL_REQUEST &&
-			    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
-			    sshdr.ascq == 0x00)
-				/* Invalid Command Operation Code or
-				 * Invalid Field in CDB, just retry
-				 * silently with RC10 */
-				return -EINVAL;
-			if (sense_valid &&
-			    sshdr.sense_key == UNIT_ATTENTION &&
-			    sshdr.asc == 0x29 && sshdr.ascq == 0x00)
-				/* Device reset might occur several times,
-				 * give it one more chance */
-				if (--reset_retries > 0)
-					continue;
-		}
-		retries--;
+	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
+				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
+				      &exec_args);
 
-	} while (the_result && retries);
+	if (the_result > 0) {
+		if (media_not_present(sdkp, &sshdr))
+			return -ENODEV;
+
+		sense_valid = scsi_sense_valid(&sshdr);
+		if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST &&
+		    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
+		     sshdr.ascq == 0x00) {
+			/*
+			 * Invalid Command Operation Code or Invalid Field in
+			 * CDB, just retry silently with RC10
+			 */
+			return -EINVAL;
+		}
+	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 4/9] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error()
  2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (2 preceding siblings ...)
  2025-10-02 19:25 ` [PATCH v4 3/9] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
@ 2025-10-02 19:25 ` Ewan D. Milne
  2025-10-06  2:02   ` Damien Le Moal
  2025-10-06  7:06   ` Hannes Reinecke
  2025-10-02 19:25 ` [PATCH v4 5/9] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

read_capacity_10() sets "sense_valid" in a different conditional statement prior to
calling read_capacity_error(), and does not use this value otherwise.  Move the call
to scsi_sense_valid() to read_capacity_error() instead of passing it as a parameter
from read_capacity_16() and read_capacity_10().

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 25561d01f972..d465609a66e3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2597,9 +2597,10 @@ static void sd_config_protection(struct scsi_disk *sdkp,
 }
 
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
-			struct scsi_sense_hdr *sshdr, int sense_valid,
-			int the_result)
+				struct scsi_sense_hdr *sshdr, int the_result)
 {
+	bool sense_valid = scsi_sense_valid(sshdr);
+
 	if (sense_valid)
 		sd_print_sense_hdr(sdkp, sshdr);
 	else
@@ -2722,7 +2723,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);
-		read_capacity_error(sdkp, sdp, &sshdr, sense_valid, the_result);
+		read_capacity_error(sdkp, sdp, &sshdr, the_result);
 		return -EINVAL;
 	}
 
@@ -2799,7 +2800,6 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		.sshdr = &sshdr,
 		.failures = &failures,
 	};
-	int sense_valid = 0;
 	int the_result;
 	sector_t lba;
 	unsigned sector_size;
@@ -2811,15 +2811,13 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 				      &exec_args);
 
 	if (the_result > 0) {
-		sense_valid = scsi_sense_valid(&sshdr);
-
 		if (media_not_present(sdkp, &sshdr))
 			return -ENODEV;
 	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(10) failed", the_result);
-		read_capacity_error(sdkp, sdp, &sshdr, sense_valid, the_result);
+		read_capacity_error(sdkp, sdp, &sshdr, the_result);
 		return -EINVAL;
 	}
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 5/9] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (3 preceding siblings ...)
  2025-10-02 19:25 ` [PATCH v4 4/9] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
@ 2025-10-02 19:25 ` Ewan D. Milne
  2025-10-06  2:04   ` Damien Le Moal
  2025-10-06  7:07   ` Hannes Reinecke
  2025-10-02 19:25 ` [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

Remove checks for -EOVERFLOW in sd_read_capacity() because this value has not
been returned to it since commit 72deb455b5ec ("block: remove CONFIG_LBDAF").

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d465609a66e3..acd79e9a0d82 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2863,8 +2863,6 @@ sd_read_capacity(struct scsi_disk *sdkp, struct queue_limits *lim,
 
 	if (sd_try_rc16_first(sdp)) {
 		sector_size = read_capacity_16(sdkp, sdp, lim, buffer);
-		if (sector_size == -EOVERFLOW)
-			goto got_data;
 		if (sector_size == -ENODEV)
 			return;
 		if (sector_size < 0)
@@ -2873,8 +2871,6 @@ sd_read_capacity(struct scsi_disk *sdkp, struct queue_limits *lim,
 			return;
 	} else {
 		sector_size = read_capacity_10(sdkp, sdp, buffer);
-		if (sector_size == -EOVERFLOW)
-			goto got_data;
 		if (sector_size < 0)
 			return;
 		if ((sizeof(sdkp->capacity) > 4) &&
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (4 preceding siblings ...)
  2025-10-02 19:25 ` [PATCH v4 5/9] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
@ 2025-10-02 19:25 ` Ewan D. Milne
  2025-10-06  2:06   ` Damien Le Moal
  2025-10-06  7:10   ` Hannes Reinecke
  2025-10-02 19:25 ` [PATCH v4 7/9] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

sd_read_capacity_10() and sd_read_capacity_16() do not check for underflow
and can extract invalid (e.g. zero) data when a malfunctioning device does
not actually transfer any data, but returns a good status otherwise.
Check for this and retry the command.  Log a message and return -EINVAL if
we can't get the capacity information.

We encountered a device that did this once but returned good data afterwards.

See similar commit 5cd3bbfad088 ("[SCSI] retry with missing data for INQUIRY")

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 59 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index acd79e9a0d82..d1a366c9c99e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2638,6 +2638,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		[13] = RC16_LEN,
 	};
 	struct scsi_sense_hdr sshdr;
+	int count, resid;
 	struct scsi_failure failure_defs[] = {
 		/*
 		 * Do not retry Invalid Command Operation Code or Invalid
@@ -2688,6 +2689,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.resid = &resid,
 		.failures = &failures,
 	};
 	int sense_valid = 0;
@@ -2699,11 +2701,22 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	if (sdp->no_read_capacity_16)
 		return -EINVAL;
 
-	memset(buffer, 0, RC16_LEN);
+	for (count = 0; count < 3; ++count) {
+		memset(buffer, 0, RC16_LEN);
 
-	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
-				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
-				      &exec_args);
+		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
+					      buffer, RC16_LEN, SD_TIMEOUT,
+					      sdkp->max_retries, &exec_args);
+
+		if (the_result || resid != RC16_LEN)
+			break;
+
+		/*
+		 * If the status was good but nothing was transferred,
+		 * we retry. It is a workaround for some buggy devices
+		 * or SAT which sometimes do not return any data.
+		 */
+	}
 
 	if (the_result > 0) {
 		if (media_not_present(sdkp, &sshdr))
@@ -2727,6 +2740,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		return -EINVAL;
 	}
 
+	if (resid == RC16_LEN) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Read Capacity(16) returned good status but no data");
+		return -EINVAL;
+	}
+
 	sector_size = get_unaligned_be32(&buffer[8]);
 	lba = get_unaligned_be64(&buffer[0]);
 
@@ -2759,11 +2778,17 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	return sector_size;
 }
 
+#define RC10_LEN 8
+#if RC10_LEN > SD_BUF_SIZE
+#error RC10_LEN must not be more than SD_BUF_SIZE
+#endif
+
 static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 						unsigned char *buffer)
 {
 	static const u8 cmd[10] = { READ_CAPACITY };
 	struct scsi_sense_hdr sshdr;
+	int count, resid;
 	struct scsi_failure failure_defs[] = {
 		/* Do not retry Medium Not Present */
 		{
@@ -2798,17 +2823,29 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.resid = &resid,
 		.failures = &failures,
 	};
 	int the_result;
 	sector_t lba;
 	unsigned sector_size;
 
-	memset(buffer, 0, 8);
+	for (count = 0; count < 3; ++count) {
+		memset(buffer, 0, RC10_LEN);
+
+		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
+					      buffer, RC10_LEN, SD_TIMEOUT,
+					      sdkp->max_retries, &exec_args);
+
+		if (the_result || resid != RC10_LEN)
+			break;
 
-	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
-				      8, SD_TIMEOUT, sdkp->max_retries,
-				      &exec_args);
+		/*
+		 * If the status was good but nothing was transferred,
+		 * we retry. It is a workaround for some buggy devices
+		 * or SAT which sometimes do not return any data.
+		 */
+	}
 
 	if (the_result > 0) {
 		if (media_not_present(sdkp, &sshdr))
@@ -2821,6 +2858,12 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		return -EINVAL;
 	}
 
+	if (resid == RC10_LEN) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Read Capacity(10) returned good status but no data");
+		return -EINVAL;
+	}
+
 	sector_size = get_unaligned_be32(&buffer[4]);
 	lba = get_unaligned_be32(&buffer[0]);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 7/9] scsi: Simplify nested if conditional in scsi_probe_lun()
  2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (5 preceding siblings ...)
  2025-10-02 19:25 ` [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
@ 2025-10-02 19:25 ` Ewan D. Milne
  2025-10-06  2:07   ` Damien Le Moal
  2025-10-06  7:11   ` Hannes Reinecke
  2025-10-02 19:25 ` [PATCH v4 8/9] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
  2025-10-02 19:25 ` [PATCH v4 9/9] scsi: scsi_debug: Add "only_once" module option to inject an error one time Ewan D. Milne
  8 siblings, 2 replies; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

Make code congruent with similar code in read_capacity_16()/read_capacity_10().

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/scsi_scan.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index c754b1d566e0..348ecfe5cdb0 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -717,16 +717,14 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 				"scsi scan: INQUIRY %s with code 0x%x\n",
 				result ? "failed" : "successful", result));
 
-		if (result == 0) {
-			/*
-			 * if nothing was transferred, we try
-			 * again. It's a workaround for some USB
-			 * devices.
-			 */
-			if (resid == try_inquiry_len)
-				continue;
-		}
-		break;
+		if (result || resid != try_inquiry_len)
+			break;
+
+		/*
+		 * If the status was good but nothing was transferred,
+		 * we retry. It is a workaround for some buggy devices
+		 * or SAT which sometimes do not return any data.
+		 */
 	}
 
 	if (result == 0) {
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 8/9] scsi: scsi_debug: Add option to suppress returned data but return good status
  2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (6 preceding siblings ...)
  2025-10-02 19:25 ` [PATCH v4 7/9] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
@ 2025-10-02 19:25 ` Ewan D. Milne
  2025-10-06  2:09   ` Damien Le Moal
  2025-10-06  7:12   ` Hannes Reinecke
  2025-10-02 19:25 ` [PATCH v4 9/9] scsi: scsi_debug: Add "only_once" module option to inject an error one time Ewan D. Milne
  8 siblings, 2 replies; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

This is used to test the earlier read_capacity_10()/16() retry patch.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/scsi_debug.c | 49 ++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 353cb60e1abe..2c49aadaef80 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -230,6 +230,7 @@ struct tape_block {
 #define SDEBUG_OPT_NO_CDB_NOISE		0x4000
 #define SDEBUG_OPT_HOST_BUSY		0x8000
 #define SDEBUG_OPT_CMD_ABORT		0x10000
+#define SDEBUG_OPT_NO_DATA		0x20000
 #define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \
 			      SDEBUG_OPT_RESET_NOISE)
 #define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \
@@ -237,7 +238,8 @@ struct tape_block {
 				  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR | \
 				  SDEBUG_OPT_SHORT_TRANSFER | \
 				  SDEBUG_OPT_HOST_BUSY | \
-				  SDEBUG_OPT_CMD_ABORT)
+				  SDEBUG_OPT_CMD_ABORT | \
+				  SDEBUG_OPT_NO_DATA)
 #define SDEBUG_OPT_RECOV_DIF_DIX (SDEBUG_OPT_RECOVERED_ERR | \
 				  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR)
 
@@ -1633,7 +1635,7 @@ static int make_ua(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 static int fill_from_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr,
 				int arr_len)
 {
-	int act_len;
+	int act_len, resid;
 	struct scsi_data_buffer *sdb = &scp->sdb;
 
 	if (!sdb->length)
@@ -1641,9 +1643,19 @@ static int fill_from_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr,
 	if (scp->sc_data_direction != DMA_FROM_DEVICE)
 		return DID_ERROR << 16;
 
-	act_len = sg_copy_from_buffer(sdb->table.sgl, sdb->table.nents,
-				      arr, arr_len);
-	scsi_set_resid(scp, scsi_bufflen(scp) - act_len);
+	/*
+	 * Conditionally suppress DATA IN transfer and leave resid set to bufflen.
+	 */
+	if (unlikely((sdebug_opts & SDEBUG_OPT_NO_DATA) &&
+		      atomic_read(&sdeb_inject_pending))) {
+		resid = scsi_bufflen(scp);
+		atomic_set(&sdeb_inject_pending, 0);
+	} else {
+		act_len = sg_copy_from_buffer(sdb->table.sgl, sdb->table.nents,
+					      arr, arr_len);
+		resid = scsi_bufflen(scp) - act_len;
+	}
+	scsi_set_resid(scp, resid);
 
 	return 0;
 }
@@ -1656,7 +1668,7 @@ static int fill_from_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr,
 static int p_fill_from_dev_buffer(struct scsi_cmnd *scp, const void *arr,
 				  int arr_len, unsigned int off_dst)
 {
-	unsigned int act_len, n;
+	unsigned int act_len, n, resid;
 	struct scsi_data_buffer *sdb = &scp->sdb;
 	off_t skip = off_dst;
 
@@ -1665,13 +1677,24 @@ static int p_fill_from_dev_buffer(struct scsi_cmnd *scp, const void *arr,
 	if (scp->sc_data_direction != DMA_FROM_DEVICE)
 		return DID_ERROR << 16;
 
-	act_len = sg_pcopy_from_buffer(sdb->table.sgl, sdb->table.nents,
-				       arr, arr_len, skip);
-	pr_debug("%s: off_dst=%u, scsi_bufflen=%u, act_len=%u, resid=%d\n",
-		 __func__, off_dst, scsi_bufflen(scp), act_len,
-		 scsi_get_resid(scp));
-	n = scsi_bufflen(scp) - (off_dst + act_len);
-	scsi_set_resid(scp, min_t(u32, scsi_get_resid(scp), n));
+	/*
+	 * Conditionally suppress DATA IN transfer and leave resid set to bufflen.
+	 */
+	if (unlikely((sdebug_opts & SDEBUG_OPT_NO_DATA) &&
+		      atomic_read(&sdeb_inject_pending))) {
+		resid = scsi_bufflen(scp);
+		atomic_set(&sdeb_inject_pending, 0);
+	} else {
+		act_len = sg_pcopy_from_buffer(sdb->table.sgl, sdb->table.nents,
+					       arr, arr_len, skip);
+		pr_debug("%s: off_dst=%u, scsi_bufflen=%u, act_len=%u, resid=%d\n",
+			 __func__, off_dst, scsi_bufflen(scp), act_len,
+			 scsi_get_resid(scp));
+		n = scsi_bufflen(scp) - (off_dst + act_len);
+		resid = min_t(u32, scsi_get_resid(scp), n);
+	}
+	scsi_set_resid(scp, resid);
+
 	return 0;
 }
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 9/9] scsi: scsi_debug: Add "only_once" module option to inject an error one time
  2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (7 preceding siblings ...)
  2025-10-02 19:25 ` [PATCH v4 8/9] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
@ 2025-10-02 19:25 ` Ewan D. Milne
  2025-10-06  2:12   ` Damien Le Moal
  2025-10-06  7:12   ` Hannes Reinecke
  8 siblings, 2 replies; 34+ messages in thread
From: Ewan D. Milne @ 2025-10-02 19:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal, hare

The "every_nth" module option allows either periodic injection of errors
every N commands, or injection of errors on all commands after N commands.
It does not allow injection of a single error after N commands, which is
useful for testing things like code in the device probe path.

Add a new "only_once" module option that allows injection of a single error.
The "every_nth" module options are still used to control when the error
is injected, to simplify the code.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/scsi_debug.c | 50 +++++++++++++++++++++++++++++++++------
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 2c49aadaef80..cbe022142b60 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -134,6 +134,7 @@ static const char *sdebug_version_date = "20210520";
 #define DEF_PER_HOST_STORE false
 #define DEF_D_SENSE   0
 #define DEF_EVERY_NTH   0
+#define DEF_ONLY_ONCE 0
 #define DEF_FAKE_RW	0
 #define DEF_GUARD 0
 #define DEF_HOST_LOCK 0
@@ -909,6 +910,7 @@ static int sdebug_dif = DEF_DIF;
 static int sdebug_dix = DEF_DIX;
 static int sdebug_dsense = DEF_D_SENSE;
 static int sdebug_every_nth = DEF_EVERY_NTH;
+static bool sdebug_only_once = DEF_ONLY_ONCE;
 static int sdebug_fake_rw = DEF_FAKE_RW;
 static unsigned int sdebug_guard = DEF_GUARD;
 static int sdebug_host_max_queue;	/* per host */
@@ -1005,6 +1007,8 @@ static int dix_writes;
 static int dix_reads;
 static int dif_errors;
 
+static bool injected;
+
 /* ZBC global data */
 static bool sdeb_zbc_in_use;	/* true for host-aware and host-managed disks */
 static int sdeb_zbc_zone_cap_mb;
@@ -7161,7 +7165,7 @@ static void clear_queue_stats(void)
 
 static bool inject_on_this_cmd(void)
 {
-	if (sdebug_every_nth == 0)
+	if (sdebug_every_nth == 0 || (sdebug_only_once && injected))
 		return false;
 	return (atomic_read(&sdebug_cmnd_count) % abs(sdebug_every_nth)) == 0;
 }
@@ -7205,7 +7209,9 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
 
 		if ((num_in_q == qdepth) &&
 		    (atomic_inc_return(&sdebug_a_tsf) >=
-		     abs(sdebug_every_nth))) {
+		     abs(sdebug_every_nth)) &&
+		    !(sdebug_only_once && injected)) {
+			injected = true;
 			atomic_set(&sdebug_a_tsf, 0);
 			scsi_result = device_qfull_result;
 
@@ -7341,6 +7347,7 @@ module_param_named(dif, sdebug_dif, int, S_IRUGO);
 module_param_named(dix, sdebug_dix, int, S_IRUGO);
 module_param_named(dsense, sdebug_dsense, int, S_IRUGO | S_IWUSR);
 module_param_named(every_nth, sdebug_every_nth, int, S_IRUGO | S_IWUSR);
+module_param_named(only_once, sdebug_only_once, bool, S_IRUGO | S_IWUSR);
 module_param_named(fake_rw, sdebug_fake_rw, int, S_IRUGO | S_IWUSR);
 module_param_named(guard, sdebug_guard, uint, S_IRUGO);
 module_param_named(host_lock, sdebug_host_lock, bool, S_IRUGO | S_IWUSR);
@@ -7424,6 +7431,7 @@ MODULE_PARM_DESC(dif, "data integrity field type: 0-3 (def=0)");
 MODULE_PARM_DESC(dix, "data integrity extensions mask (def=0)");
 MODULE_PARM_DESC(dsense, "use descriptor sense format(def=0 -> fixed)");
 MODULE_PARM_DESC(every_nth, "timeout every nth command(def=0)");
+MODULE_PARM_DESC(only_once, "timeout only once after the nth command(def=0)");
 MODULE_PARM_DESC(fake_rw, "fake reads/writes instead of copying (def=0)");
 MODULE_PARM_DESC(guard, "protection checksum: 0=crc, 1=ip (def=0)");
 MODULE_PARM_DESC(host_lock, "host_lock is ignored (def=0)");
@@ -7567,9 +7575,9 @@ static int scsi_debug_show_info(struct seq_file *m, struct Scsi_Host *host)
 	seq_printf(m, "num_tgts=%d, %ssize=%d MB, opts=0x%x, every_nth=%d\n",
 		   sdebug_num_tgts, "shared (ram) ", sdebug_dev_size_mb,
 		   sdebug_opts, sdebug_every_nth);
-	seq_printf(m, "delay=%d, ndelay=%d, max_luns=%d, sector_size=%d %s\n",
+	seq_printf(m, "delay=%d, ndelay=%d, max_luns=%d, sector_size=%d %s, only_once=%d\n",
 		   sdebug_jdelay, sdebug_ndelay, sdebug_max_luns,
-		   sdebug_sector_size, "bytes");
+		   sdebug_sector_size, "bytes", sdebug_only_once);
 	seq_printf(m, "cylinders=%d, heads=%d, sectors=%d, command aborts=%d\n",
 		   sdebug_cylinders_per, sdebug_heads, sdebug_sectors_per,
 		   num_aborts);
@@ -7933,6 +7941,24 @@ static ssize_t every_nth_store(struct device_driver *ddp, const char *buf,
 }
 static DRIVER_ATTR_RW(every_nth);
 
+static ssize_t only_once_show(struct device_driver *ddp, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%d\n", !!sdebug_only_once);
+}
+static ssize_t only_once_store(struct device_driver *ddp, const char *buf,
+			       size_t count)
+{
+	int n;
+
+	if ((count > 0) && (1 == sscanf(buf, "%d", &n)) && (n >= 0)) {
+		sdebug_only_once = (n > 0);
+		injected = false;
+		return count;
+	}
+	return -EINVAL;
+}
+static DRIVER_ATTR_RW(only_once);
+
 static ssize_t lun_format_show(struct device_driver *ddp, char *buf)
 {
 	return scnprintf(buf, PAGE_SIZE, "%d\n", (int)sdebug_lun_am);
@@ -8440,6 +8466,7 @@ static struct attribute *sdebug_drv_attrs[] = {
 	&driver_attr_dev_size_mb.attr,
 	&driver_attr_num_parts.attr,
 	&driver_attr_every_nth.attr,
+	&driver_attr_only_once.attr,
 	&driver_attr_lun_format.attr,
 	&driver_attr_max_luns.attr,
 	&driver_attr_max_queue.attr,
@@ -8997,6 +9024,9 @@ static int sdebug_change_qdepth(struct scsi_device *sdev, int qdepth)
 
 static bool fake_timeout(struct scsi_cmnd *scp)
 {
+	if (sdebug_only_once && injected)
+		return false;
+
 	if (0 == (atomic_read(&sdebug_cmnd_count) % abs(sdebug_every_nth))) {
 		if (sdebug_every_nth < -1)
 			sdebug_every_nth = -1;
@@ -9297,8 +9327,10 @@ static int scsi_debug_queuecommand(struct Scsi_Host *shost,
 		sdev_printk(KERN_INFO, sdp, "%s: tag=%#x, cmd %s\n", my_name,
 			    blk_mq_unique_tag(scsi_cmd_to_rq(scp)), b);
 	}
-	if (unlikely(inject_now && (sdebug_opts & SDEBUG_OPT_HOST_BUSY)))
+	if (unlikely(inject_now && (sdebug_opts & SDEBUG_OPT_HOST_BUSY))) {
+		injected = true;
 		return SCSI_MLQUEUE_HOST_BUSY;
+	}
 	has_wlun_rl = (sdp->lun == SCSI_W_LUN_REPORT_LUNS);
 	if (unlikely(lun_index >= sdebug_max_luns && !has_wlun_rl))
 		goto err_out;
@@ -9334,8 +9366,10 @@ static int scsi_debug_queuecommand(struct Scsi_Host *shost,
 		return ret;
 	}
 
-	if (unlikely(inject_now && !atomic_read(&sdeb_inject_pending)))
+	if (unlikely(inject_now && !atomic_read(&sdeb_inject_pending))) {
+		injected = true;
 		atomic_set(&sdeb_inject_pending, 1);
+	}
 
 	na = oip->num_attached;
 	r_pfp = oip->pfp;
@@ -9412,8 +9446,10 @@ static int scsi_debug_queuecommand(struct Scsi_Host *shost,
 	if (sdebug_fake_rw && (F_FAKE_RW & flags))
 		goto fini;
 	if (unlikely(sdebug_every_nth)) {
-		if (fake_timeout(scp))
+		if (fake_timeout(scp)) {
+			injected = true;
 			return 0;	/* ignore command: make trouble */
+		}
 	}
 	if (likely(oip->pfp))
 		pfp = oip->pfp;	/* calls a resp_* function */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ
  2025-10-02 19:25 ` [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ Ewan D. Milne
@ 2025-10-03  4:24   ` Damien Le Moal
  2025-10-03 14:40     ` Ewan Milne
  0 siblings, 1 reply; 34+ messages in thread
From: Damien Le Moal @ 2025-10-03  4:24 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, hare

On 10/3/25 04:25, Ewan D. Milne wrote:
> This makes the handling in read_capacity_10() consistent with other
> cases, e.g. sd_spinup_disk().  Omitting .ascq in scsi_failure did not
> result in wildcard matching, it only handled ASCQ 0x00.  This patch
> changes the retry behavior, we no longer retry 3 times on ASC 0x3a
> if a nonzero ASCQ is ever returned.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Doesn't this need a Fixes tag ?

Other than that, looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 1/9] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures
  2025-10-02 19:25 ` [PATCH v4 1/9] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
@ 2025-10-03  4:24   ` Damien Le Moal
  2025-10-06  6:59   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2025-10-03  4:24 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, hare

On 10/3/25 04:25, Ewan D. Milne wrote:
> This does not change any behavior (since .ascq was initialized to 0 by
> the compiler) but makes explicit that the entry in the scsi_failures
> array does not handle cases where ASCQ is nonzero, consistent with other
> usage.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ
  2025-10-03  4:24   ` Damien Le Moal
@ 2025-10-03 14:40     ` Ewan Milne
  2025-10-03 15:44       ` Bart Van Assche
  0 siblings, 1 reply; 34+ messages in thread
From: Ewan Milne @ 2025-10-03 14:40 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-scsi, michael.christie, dgilbert, bvanassche, hare

On Fri, Oct 3, 2025 at 12:24 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>
> On 10/3/25 04:25, Ewan D. Milne wrote:
> > This makes the handling in read_capacity_10() consistent with other
> > cases, e.g. sd_spinup_disk().  Omitting .ascq in scsi_failure did not
> > result in wildcard matching, it only handled ASCQ 0x00.  This patch
> > changes the retry behavior, we no longer retry 3 times on ASC 0x3a
> > if a nonzero ASCQ is ever returned.
> >
> > Signed-off-by: Ewan D. Milne <emilne@redhat.com>
>
> Doesn't this need a Fixes tag ?

I don't normally add a Fixes: tag for things like this, since I don't know
if any device actually returns a nonzero ASCQ.  (I think either you or
Bart asked for this change in an earlier patch series, which is fine.)

-Ewan

>
> Other than that, looks OK to me.
>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
>
> --
> Damien Le Moal
> Western Digital Research
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ
  2025-10-03 14:40     ` Ewan Milne
@ 2025-10-03 15:44       ` Bart Van Assche
  2025-10-03 18:40         ` Ewan Milne
  0 siblings, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2025-10-03 15:44 UTC (permalink / raw)
  To: Ewan Milne, Damien Le Moal; +Cc: linux-scsi, michael.christie, dgilbert, hare

On 10/3/25 7:40 AM, Ewan Milne wrote:
> On Fri, Oct 3, 2025 at 12:24 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>>
>> On 10/3/25 04:25, Ewan D. Milne wrote:
>>> This makes the handling in read_capacity_10() consistent with other
>>> cases, e.g. sd_spinup_disk().  Omitting .ascq in scsi_failure did not
>>> result in wildcard matching, it only handled ASCQ 0x00.  This patch
>>> changes the retry behavior, we no longer retry 3 times on ASC 0x3a
>>> if a nonzero ASCQ is ever returned.
>>>
>>> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
>>
>> Doesn't this need a Fixes tag ?
> 
> I don't normally add a Fixes: tag for things like this, since I don't know
> if any device actually returns a nonzero ASCQ.  (I think either you or
> Bart asked for this change in an earlier patch series, which is fine.)

I think that I asked for this change. From 
https://www.t10.org/lists/asc-num.htm:

3Ah/00h  DZT ROM  BK    MEDIUM NOT PRESENT
3Ah/01h  DZT ROM  BK    MEDIUM NOT PRESENT - TRAY CLOSED
3Ah/02h  DZT ROM  BK    MEDIUM NOT PRESENT - TRAY OPEN
3Ah/03h  DZT ROM  B     MEDIUM NOT PRESENT - LOADABLE
3Ah/04h  DZT RO   B     MEDIUM NOT PRESENT - MEDIUM AUXILIARY MEMORY 
ACCESSIBLE

In other words, a Fixes: tag is probably appropriate.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ
  2025-10-03 15:44       ` Bart Van Assche
@ 2025-10-03 18:40         ` Ewan Milne
  2025-10-06 17:54           ` Bart Van Assche
  0 siblings, 1 reply; 34+ messages in thread
From: Ewan Milne @ 2025-10-03 18:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Damien Le Moal, linux-scsi, michael.christie, dgilbert, hare

On Fri, Oct 3, 2025 at 11:45 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 10/3/25 7:40 AM, Ewan Milne wrote:
> > On Fri, Oct 3, 2025 at 12:24 AM Damien Le Moal <dlemoal@kernel.org> wrote:
> >>
> >> On 10/3/25 04:25, Ewan D. Milne wrote:
> >>> This makes the handling in read_capacity_10() consistent with other
> >>> cases, e.g. sd_spinup_disk().  Omitting .ascq in scsi_failure did not
> >>> result in wildcard matching, it only handled ASCQ 0x00.  This patch
> >>> changes the retry behavior, we no longer retry 3 times on ASC 0x3a
> >>> if a nonzero ASCQ is ever returned.
> >>>
> >>> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> >>
> >> Doesn't this need a Fixes tag ?
> >
> > I don't normally add a Fixes: tag for things like this, since I don't know
> > if any device actually returns a nonzero ASCQ.  (I think either you or
> > Bart asked for this change in an earlier patch series, which is fine.)
>
> I think that I asked for this change. From
> https://www.t10.org/lists/asc-num.htm:
>
> 3Ah/00h  DZT ROM  BK    MEDIUM NOT PRESENT
> 3Ah/01h  DZT ROM  BK    MEDIUM NOT PRESENT - TRAY CLOSED
> 3Ah/02h  DZT ROM  BK    MEDIUM NOT PRESENT - TRAY OPEN
> 3Ah/03h  DZT ROM  B     MEDIUM NOT PRESENT - LOADABLE
> 3Ah/04h  DZT RO   B     MEDIUM NOT PRESENT - MEDIUM AUXILIARY MEMORY
> ACCESSIBLE
>
> In other words, a Fixes: tag is probably appropriate.
>
> Thanks,
>
> Bart.
>

Given that c1acf38cd11ef ("scsi: sd: Have midlayer retry
sd_spinup_disk() errors")
set .ascq = SCMD_FAILURE_ASCQ_ANY for .asc = 0x3A and 0f11328f2f466
("scsi: sd: Have midlayer retry read_capacity_10() errors") didn't,
this does look
like the right change to make.  However, prior to 0f11328f2f466 there
was were retries
which Mike's patch took out, I concede it's unlikely that it
intentionally left the default
3 retries for sub-cases of ASC 0x3A.  MIke, feel free to correct me.

Martin, please add:

Fixes: 0f11328f2f466 ("scsi: sd: Have midlayer retry read_capacity_10() errors")

-Ewan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/9] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-10-02 19:25 ` [PATCH v4 3/9] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
@ 2025-10-06  2:00   ` Damien Le Moal
  2025-10-06  7:01   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2025-10-06  2:00 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, hare

On 10/3/25 04:25, Ewan D. Milne wrote:
> This has read_capacity_16 have scsi-ml retry errors instead of driving
> them itself.
> 
> There are 2 behavior changes with this patch:
> 1. There is one behavior change where we no longer retry when
> scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
> for failures like the queue being removed, and for the case where there
> are no tags/reqs since the block layer waits/retries for us. For possible
> memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
> retrying will probably not help.
> 2. For the specific UAs we checked for and retried, we would get
> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
> from the main loop's retries. Each UA now gets
> READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
> retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
> is already 10 and is not based on anything specific like a spec or
> device, so the extra 3 we got from the main loop was probably just an
> accident and is not going to help.
> 
> Original patch by Mike Christie <michael.christie@oracle.com> modified
> based upon review comments for an earlier version of this patch.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>  drivers/scsi/sd.c | 107 +++++++++++++++++++++++++++++++---------------
>  1 file changed, 73 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index e3b802b26f0e..25561d01f972 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2631,14 +2631,66 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  		struct queue_limits *lim, unsigned char *buffer)
>  {
> -	unsigned char cmd[16];
> +	static const u8 cmd[16] = {
> +		[0] = SERVICE_ACTION_IN_16,
> +		[1] = SAI_READ_CAPACITY_16,
> +		[13] = RC16_LEN,
> +	};
>  	struct scsi_sense_hdr sshdr;
> +	struct scsi_failure failure_defs[] = {
> +		/*
> +		 * Do not retry Invalid Command Operation Code or Invalid
> +		 * Field in CDB.
> +		 */
> +		{
> +			.sense = ILLEGAL_REQUEST,
> +			.asc = 0x20,
> +			.ascq = 0x00,

There are many possible ascq values for asc 0x20, so may be use
SCMD_FAILURE_ASCQ_ANY here ?

> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		{
> +			.sense = ILLEGAL_REQUEST,
> +			.asc = 0x24,
> +			.ascq = 0x00,

Same here.

> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		/* Do not retry Medium Not Present */
> +		{
> +			.sense = UNIT_ATTENTION,
> +			.asc = 0x3A,
> +			.ascq = SCMD_FAILURE_ASCQ_ANY,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		{
> +			.sense = NOT_READY,
> +			.asc = 0x3A,
> +			.ascq = SCMD_FAILURE_ASCQ_ANY,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		/* Device reset might occur several times so retry a lot */
> +		{
> +			.sense = UNIT_ATTENTION,
> +			.asc = 0x29,
> +			.ascq = 0x00,

Same here too.

> +			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		/* Any other error not listed above retry 3 times */
> +		{
> +			.result = SCMD_FAILURE_RESULT_ANY,
> +			.allowed = 3,
> +		},
> +		{}
> +	};

[...]

> -	} while (the_result && retries);
> +	if (the_result > 0) {
> +		if (media_not_present(sdkp, &sshdr))
> +			return -ENODEV;
> +
> +		sense_valid = scsi_sense_valid(&sshdr);
> +		if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST &&
> +		    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
> +		     sshdr.ascq == 0x00) {
> +			/*
> +			 * Invalid Command Operation Code or Invalid Field in
> +			 * CDB, just retry silently with RC10

Please spell out "READ CAPACITY (10)", or "...retry silently with the 10B READ
CAPACITY command".

> +			 */
> +			return -EINVAL;
> +		}
> +	}
>  
>  	if (the_result) {
>  		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 4/9] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error()
  2025-10-02 19:25 ` [PATCH v4 4/9] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
@ 2025-10-06  2:02   ` Damien Le Moal
  2025-10-06  7:06   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2025-10-06  2:02 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, hare

On 10/3/25 04:25, Ewan D. Milne wrote:
> read_capacity_10() sets "sense_valid" in a different conditional statement prior to
> calling read_capacity_error(), and does not use this value otherwise.  Move the call
> to scsi_sense_valid() to read_capacity_error() instead of passing it as a parameter
> from read_capacity_16() and read_capacity_10().
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 5/9] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  2025-10-02 19:25 ` [PATCH v4 5/9] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
@ 2025-10-06  2:04   ` Damien Le Moal
  2025-10-06  7:07   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2025-10-06  2:04 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, hare

On 10/3/25 04:25, Ewan D. Milne wrote:
> Remove checks for -EOVERFLOW in sd_read_capacity() because this value has not
> been returned to it since commit 72deb455b5ec ("block: remove CONFIG_LBDAF").
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-10-02 19:25 ` [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
@ 2025-10-06  2:06   ` Damien Le Moal
  2025-10-06  7:10   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2025-10-06  2:06 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, hare

On 10/3/25 04:25, Ewan D. Milne wrote:
> sd_read_capacity_10() and sd_read_capacity_16() do not check for underflow
> and can extract invalid (e.g. zero) data when a malfunctioning device does
> not actually transfer any data, but returns a good status otherwise.
> Check for this and retry the command.  Log a message and return -EINVAL if
> we can't get the capacity information.
> 
> We encountered a device that did this once but returned good data afterwards.
> 
> See similar commit 5cd3bbfad088 ("[SCSI] retry with missing data for INQUIRY")
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 7/9] scsi: Simplify nested if conditional in scsi_probe_lun()
  2025-10-02 19:25 ` [PATCH v4 7/9] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
@ 2025-10-06  2:07   ` Damien Le Moal
  2025-10-06  7:11   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2025-10-06  2:07 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, hare

On 10/3/25 04:25, Ewan D. Milne wrote:
> Make code congruent with similar code in read_capacity_16()/read_capacity_10().
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 8/9] scsi: scsi_debug: Add option to suppress returned data but return good status
  2025-10-02 19:25 ` [PATCH v4 8/9] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
@ 2025-10-06  2:09   ` Damien Le Moal
  2025-10-06  7:12   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2025-10-06  2:09 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, hare

On 10/3/25 04:25, Ewan D. Milne wrote:
> This is used to test the earlier read_capacity_10()/16() retry patch.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 9/9] scsi: scsi_debug: Add "only_once" module option to inject an error one time
  2025-10-02 19:25 ` [PATCH v4 9/9] scsi: scsi_debug: Add "only_once" module option to inject an error one time Ewan D. Milne
@ 2025-10-06  2:12   ` Damien Le Moal
  2025-10-06  7:12   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2025-10-06  2:12 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, hare

On 10/3/25 04:25, Ewan D. Milne wrote:
> The "every_nth" module option allows either periodic injection of errors
> every N commands, or injection of errors on all commands after N commands.
> It does not allow injection of a single error after N commands, which is
> useful for testing things like code in the device probe path.
> 
> Add a new "only_once" module option that allows injection of a single error.
> The "every_nth" module options are still used to control when the error
> is injected, to simplify the code.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Looks OK to me. One nit below.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


> +static ssize_t only_once_store(struct device_driver *ddp, const char *buf,
> +			       size_t count)
> +{
> +	int n;
> +
> +	if ((count > 0) && (1 == sscanf(buf, "%d", &n)) && (n >= 0)) {

Please drop the inner parentheses.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 1/9] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures
  2025-10-02 19:25 ` [PATCH v4 1/9] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
  2025-10-03  4:24   ` Damien Le Moal
@ 2025-10-06  6:59   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2025-10-06  6:59 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 10/2/25 21:25, Ewan D. Milne wrote:
> This does not change any behavior (since .ascq was initialized to 0 by
> the compiler) but makes explicit that the entry in the scsi_failures
> array does not handle cases where ASCQ is nonzero, consistent with other
> usage.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/scsi_scan.c | 2 ++
>   drivers/scsi/sd.c        | 1 +
>   2 files changed, 3 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/9] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-10-02 19:25 ` [PATCH v4 3/9] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
  2025-10-06  2:00   ` Damien Le Moal
@ 2025-10-06  7:01   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2025-10-06  7:01 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 10/2/25 21:25, Ewan D. Milne wrote:
> This has read_capacity_16 have scsi-ml retry errors instead of driving
> them itself.
> 
> There are 2 behavior changes with this patch:
> 1. There is one behavior change where we no longer retry when
> scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
> for failures like the queue being removed, and for the case where there
> are no tags/reqs since the block layer waits/retries for us. For possible
> memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
> retrying will probably not help.
> 2. For the specific UAs we checked for and retried, we would get
> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
> from the main loop's retries. Each UA now gets
> READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
> retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
> is already 10 and is not based on anything specific like a spec or
> device, so the extra 3 we got from the main loop was probably just an
> accident and is not going to help.
> 
> Original patch by Mike Christie <michael.christie@oracle.com> modified
> based upon review comments for an earlier version of this patch.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 107 +++++++++++++++++++++++++++++++---------------
>   1 file changed, 73 insertions(+), 34 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 4/9] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error()
  2025-10-02 19:25 ` [PATCH v4 4/9] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
  2025-10-06  2:02   ` Damien Le Moal
@ 2025-10-06  7:06   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2025-10-06  7:06 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 10/2/25 21:25, Ewan D. Milne wrote:
> read_capacity_10() sets "sense_valid" in a different conditional statement prior to
> calling read_capacity_error(), and does not use this value otherwise.  Move the call
> to scsi_sense_valid() to read_capacity_error() instead of passing it as a parameter
> from read_capacity_16() and read_capacity_10().
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 12 +++++-------
>   1 file changed, 5 insertions(+), 7 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 5/9] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  2025-10-02 19:25 ` [PATCH v4 5/9] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
  2025-10-06  2:04   ` Damien Le Moal
@ 2025-10-06  7:07   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2025-10-06  7:07 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 10/2/25 21:25, Ewan D. Milne wrote:
> Remove checks for -EOVERFLOW in sd_read_capacity() because this value has not
> been returned to it since commit 72deb455b5ec ("block: remove CONFIG_LBDAF").
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-10-02 19:25 ` [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
  2025-10-06  2:06   ` Damien Le Moal
@ 2025-10-06  7:10   ` Hannes Reinecke
  2025-10-06  7:20     ` Damien Le Moal
  1 sibling, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2025-10-06  7:10 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 10/2/25 21:25, Ewan D. Milne wrote:
> sd_read_capacity_10() and sd_read_capacity_16() do not check for underflow
> and can extract invalid (e.g. zero) data when a malfunctioning device does
> not actually transfer any data, but returns a good status otherwise.
> Check for this and retry the command.  Log a message and return -EINVAL if
> we can't get the capacity information.
> 
> We encountered a device that did this once but returned good data afterwards.
> 
> See similar commit 5cd3bbfad088 ("[SCSI] retry with missing data for INQUIRY")
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 59 ++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 51 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index acd79e9a0d82..d1a366c9c99e 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2638,6 +2638,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   		[13] = RC16_LEN,
>   	};
>   	struct scsi_sense_hdr sshdr;
> +	int count, resid;
>   	struct scsi_failure failure_defs[] = {
>   		/*
>   		 * Do not retry Invalid Command Operation Code or Invalid
> @@ -2688,6 +2689,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   	};
>   	const struct scsi_exec_args exec_args = {
>   		.sshdr = &sshdr,
> +		.resid = &resid,
>   		.failures = &failures,
>   	};
>   	int sense_valid = 0;
> @@ -2699,11 +2701,22 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   	if (sdp->no_read_capacity_16)
>   		return -EINVAL;
>   
> -	memset(buffer, 0, RC16_LEN);
> +	for (count = 0; count < 3; ++count) {
> +		memset(buffer, 0, RC16_LEN);
>   
> -	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
> -				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
> -				      &exec_args);
> +		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
> +					      buffer, RC16_LEN, SD_TIMEOUT,
> +					      sdkp->max_retries, &exec_args);
> +
> +		if (the_result || resid != RC16_LEN)
> +			break;
> +
> +		/*
> +		 * If the status was good but nothing was transferred,
> +		 * we retry. It is a workaround for some buggy devices
> +		 * or SAT which sometimes do not return any data.
> +		 */
> +	}
>   
>   	if (the_result > 0) {
>   		if (media_not_present(sdkp, &sshdr))
> @@ -2727,6 +2740,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   		return -EINVAL;
>   	}
>   
> +	if (resid == RC16_LEN) {
> +		sd_printk(KERN_ERR, sdkp,
> +			  "Read Capacity(16) returned good status but no data");
> +		return -EINVAL;
> +	}
> +
>   	sector_size = get_unaligned_be32(&buffer[8]);
>   	lba = get_unaligned_be64(&buffer[0]);
>   
> @@ -2759,11 +2778,17 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   	return sector_size;
>   }
>   
> +#define RC10_LEN 8
> +#if RC10_LEN > SD_BUF_SIZE
> +#error RC10_LEN must not be more than SD_BUF_SIZE
> +#endif
> +
>   static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   						unsigned char *buffer)
>   {
>   	static const u8 cmd[10] = { READ_CAPACITY };
>   	struct scsi_sense_hdr sshdr;
> +	int count, resid;
>   	struct scsi_failure failure_defs[] = {
>   		/* Do not retry Medium Not Present */
>   		{
> @@ -2798,17 +2823,29 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   	};
>   	const struct scsi_exec_args exec_args = {
>   		.sshdr = &sshdr,
> +		.resid = &resid,
>   		.failures = &failures,
>   	};
>   	int the_result;
>   	sector_t lba;
>   	unsigned sector_size;
>   
> -	memset(buffer, 0, 8);
> +	for (count = 0; count < 3; ++count) {
> +		memset(buffer, 0, RC10_LEN);
> +
> +		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
> +					      buffer, RC10_LEN, SD_TIMEOUT,
> +					      sdkp->max_retries, &exec_args);
> +
> +		if (the_result || resid != RC10_LEN)
> +			break;
>   
Hmm. What would happen if some data is returned, but less than the
expected amount?
We'd be having a hard time parsing that, wouldn't we?

So I guess we should check if we had received _all_ data, no?

> -	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
> -				      8, SD_TIMEOUT, sdkp->max_retries,
> -				      &exec_args);
> +		/*
> +		 * If the status was good but nothing was transferred,
> +		 * we retry. It is a workaround for some buggy devices
> +		 * or SAT which sometimes do not return any data.
> +		 */
> +	}
>   
>   	if (the_result > 0) {
>   		if (media_not_present(sdkp, &sshdr))
> @@ -2821,6 +2858,12 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   		return -EINVAL;
>   	}
>   
> +	if (resid == RC10_LEN) {
> +		sd_printk(KERN_ERR, sdkp,
> +			  "Read Capacity(10) returned good status but no data");
> +		return -EINVAL;
> +	}
> +
>   	sector_size = get_unaligned_be32(&buffer[4]);
>   	lba = get_unaligned_be32(&buffer[0]);
>   

Similar here. I would assume that _any_ resid value larger than zero 
would render thecapacity value unreliable...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 7/9] scsi: Simplify nested if conditional in scsi_probe_lun()
  2025-10-02 19:25 ` [PATCH v4 7/9] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
  2025-10-06  2:07   ` Damien Le Moal
@ 2025-10-06  7:11   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2025-10-06  7:11 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 10/2/25 21:25, Ewan D. Milne wrote:
> Make code congruent with similar code in read_capacity_16()/read_capacity_10().
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/scsi_scan.c | 18 ++++++++----------
>   1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index c754b1d566e0..348ecfe5cdb0 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -717,16 +717,14 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
>   				"scsi scan: INQUIRY %s with code 0x%x\n",
>   				result ? "failed" : "successful", result));
>   
> -		if (result == 0) {
> -			/*
> -			 * if nothing was transferred, we try
> -			 * again. It's a workaround for some USB
> -			 * devices.
> -			 */
> -			if (resid == try_inquiry_len)
> -				continue;
> -		}
> -		break;
> +		if (result || resid != try_inquiry_len)
> +			break;
> +
> +		/*
> +		 * If the status was good but nothing was transferred,
> +		 * we retry. It is a workaround for some buggy devices
> +		 * or SAT which sometimes do not return any data.
> +		 */
>   	}
>   
>   	if (result == 0) {

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 8/9] scsi: scsi_debug: Add option to suppress returned data but return good status
  2025-10-02 19:25 ` [PATCH v4 8/9] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
  2025-10-06  2:09   ` Damien Le Moal
@ 2025-10-06  7:12   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2025-10-06  7:12 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 10/2/25 21:25, Ewan D. Milne wrote:
> This is used to test the earlier read_capacity_10()/16() retry patch.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/scsi_debug.c | 49 ++++++++++++++++++++++++++++-----------
>   1 file changed, 36 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 9/9] scsi: scsi_debug: Add "only_once" module option to inject an error one time
  2025-10-02 19:25 ` [PATCH v4 9/9] scsi: scsi_debug: Add "only_once" module option to inject an error one time Ewan D. Milne
  2025-10-06  2:12   ` Damien Le Moal
@ 2025-10-06  7:12   ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2025-10-06  7:12 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 10/2/25 21:25, Ewan D. Milne wrote:
> The "every_nth" module option allows either periodic injection of errors
> every N commands, or injection of errors on all commands after N commands.
> It does not allow injection of a single error after N commands, which is
> useful for testing things like code in the device probe path.
> 
> Add a new "only_once" module option that allows injection of a single error.
> The "every_nth" module options are still used to control when the error
> is injected, to simplify the code.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/scsi_debug.c | 50 +++++++++++++++++++++++++++++++++------
>   1 file changed, 43 insertions(+), 7 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-10-06  7:10   ` Hannes Reinecke
@ 2025-10-06  7:20     ` Damien Le Moal
  2025-10-06  7:25       ` Damien Le Moal
  0 siblings, 1 reply; 34+ messages in thread
From: Damien Le Moal @ 2025-10-06  7:20 UTC (permalink / raw)
  To: Hannes Reinecke, Ewan D. Milne, linux-scsi
  Cc: michael.christie, dgilbert, bvanassche

On 10/6/25 16:10, Hannes Reinecke wrote:
>> -	memset(buffer, 0, 8);
>> +	for (count = 0; count < 3; ++count) {
>> +		memset(buffer, 0, RC10_LEN);
>> +
>> +		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
>> +					      buffer, RC10_LEN, SD_TIMEOUT,
>> +					      sdkp->max_retries, &exec_args);
>> +
>> +		if (the_result || resid != RC10_LEN)
>> +			break;
>>   
> Hmm. What would happen if some data is returned, but less than the
> expected amount?
> We'd be having a hard time parsing that, wouldn't we?

All data received would normally mean success, so result == 0.
Bad devices cases would be success with resid != 0 but less than RC10_LEN. So I
think the break is correct here since the result is checked after the loop.

> 
> So I guess we should check if we had received _all_ data, no?
> 
>> -	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
>> -				      8, SD_TIMEOUT, sdkp->max_retries,
>> -				      &exec_args);
>> +		/*
>> +		 * If the status was good but nothing was transferred,
>> +		 * we retry. It is a workaround for some buggy devices
>> +		 * or SAT which sometimes do not return any data.
>> +		 */
>> +	}
>>   
>>   	if (the_result > 0) {
>>   		if (media_not_present(sdkp, &sshdr))
>> @@ -2821,6 +2858,12 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>   		return -EINVAL;
>>   	}
>>   
>> +	if (resid == RC10_LEN) {
>> +		sd_printk(KERN_ERR, sdkp,
>> +			  "Read Capacity(10) returned good status but no data");
>> +		return -EINVAL;
>> +	}
>> +

And here we should just check that resid is 0 to make the checks complete ? This
could be included in the above check with something like:

	if (resid) {
		sd_printk(KERN_ERR, sdkp,
			"Read Capacity(10) good status but incomplete data");
		return -EINVAL;
	}

>>   	sector_size = get_unaligned_be32(&buffer[4]);
>>   	lba = get_unaligned_be32(&buffer[0]);
>>   
> 
> Similar here. I would assume that _any_ resid value larger than zero 
> would render thecapacity value unreliable...
> 
> Cheers,
> 
> Hannes


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-10-06  7:20     ` Damien Le Moal
@ 2025-10-06  7:25       ` Damien Le Moal
  2025-10-06 10:59         ` Hannes Reinecke
  0 siblings, 1 reply; 34+ messages in thread
From: Damien Le Moal @ 2025-10-06  7:25 UTC (permalink / raw)
  To: Hannes Reinecke, Ewan D. Milne, linux-scsi
  Cc: michael.christie, dgilbert, bvanassche

On 10/6/25 16:20, Damien Le Moal wrote:
> On 10/6/25 16:10, Hannes Reinecke wrote:
>>> -	memset(buffer, 0, 8);
>>> +	for (count = 0; count < 3; ++count) {
>>> +		memset(buffer, 0, RC10_LEN);
>>> +
>>> +		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
>>> +					      buffer, RC10_LEN, SD_TIMEOUT,
>>> +					      sdkp->max_retries, &exec_args);
>>> +
>>> +		if (the_result || resid != RC10_LEN)
>>> +			break;
>>>   
>> Hmm. What would happen if some data is returned, but less than the
>> expected amount?
>> We'd be having a hard time parsing that, wouldn't we?
> 
> All data received would normally mean success, so result == 0.
> Bad devices cases would be success with resid != 0 but less than RC10_LEN. So I
> think the break is correct here since the result is checked after the loop.

Arg, no, bad device may return sucees with resid != 0... So yeah, together with
the below change, this should probably be:

	if (the_result || resid)
		break;

> 
>>
>> So I guess we should check if we had received _all_ data, no?
>>
>>> -	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
>>> -				      8, SD_TIMEOUT, sdkp->max_retries,
>>> -				      &exec_args);
>>> +		/*
>>> +		 * If the status was good but nothing was transferred,
>>> +		 * we retry. It is a workaround for some buggy devices
>>> +		 * or SAT which sometimes do not return any data.
>>> +		 */
>>> +	}
>>>   
>>>   	if (the_result > 0) {
>>>   		if (media_not_present(sdkp, &sshdr))
>>> @@ -2821,6 +2858,12 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>>   		return -EINVAL;
>>>   	}
>>>   
>>> +	if (resid == RC10_LEN) {
>>> +		sd_printk(KERN_ERR, sdkp,
>>> +			  "Read Capacity(10) returned good status but no data");
>>> +		return -EINVAL;
>>> +	}
>>> +
> 
> And here we should just check that resid is 0 to make the checks complete ? This
> could be included in the above check with something like:
> 
> 	if (resid) {
> 		sd_printk(KERN_ERR, sdkp,
> 			"Read Capacity(10) good status but incomplete data");
> 		return -EINVAL;
> 	}
> 
>>>   	sector_size = get_unaligned_be32(&buffer[4]);
>>>   	lba = get_unaligned_be32(&buffer[0]);
>>>   
>>
>> Similar here. I would assume that _any_ resid value larger than zero 
>> would render thecapacity value unreliable...
>>
>> Cheers,
>>
>> Hannes
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-10-06  7:25       ` Damien Le Moal
@ 2025-10-06 10:59         ` Hannes Reinecke
  0 siblings, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2025-10-06 10:59 UTC (permalink / raw)
  To: Damien Le Moal, Ewan D. Milne, linux-scsi
  Cc: michael.christie, dgilbert, bvanassche

On 10/6/25 09:25, Damien Le Moal wrote:
> On 10/6/25 16:20, Damien Le Moal wrote:
>> On 10/6/25 16:10, Hannes Reinecke wrote:
>>>> -	memset(buffer, 0, 8);
>>>> +	for (count = 0; count < 3; ++count) {
>>>> +		memset(buffer, 0, RC10_LEN);
>>>> +
>>>> +		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
>>>> +					      buffer, RC10_LEN, SD_TIMEOUT,
>>>> +					      sdkp->max_retries, &exec_args);
>>>> +
>>>> +		if (the_result || resid != RC10_LEN)
>>>> +			break;
>>>>    
>>> Hmm. What would happen if some data is returned, but less than the
>>> expected amount?
>>> We'd be having a hard time parsing that, wouldn't we?
>>
>> All data received would normally mean success, so result == 0.
>> Bad devices cases would be success with resid != 0 but less than RC10_LEN. So I
>> think the break is correct here since the result is checked after the loop.
> 
> Arg, no, bad device may return sucees with resid != 0... So yeah, together with
> the below change, this should probably be:
> 
> 	if (the_result || resid)
> 		break;
> 

Precisely what I tried to suggest :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ
  2025-10-03 18:40         ` Ewan Milne
@ 2025-10-06 17:54           ` Bart Van Assche
  0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2025-10-06 17:54 UTC (permalink / raw)
  To: Ewan Milne; +Cc: Damien Le Moal, linux-scsi, michael.christie, dgilbert, hare

On 10/3/25 11:40 AM, Ewan Milne wrote:
> Martin, please add:
> 
> Fixes: 0f11328f2f466 ("scsi: sd: Have midlayer retry read_capacity_10() errors")

Ewan, please rebase, retest and repost this patch series with the above
tag added after the merge window has closed. Kernel v6.17 was released
eight days ago. Hence, the merge window is still open. See also
https://lore.kernel.org/lkml/CAHk-=wiX38oG6=xFBNLO0pnjqHfxzjd6-1kZ5Nv9HfqNC2PoFA@mail.gmail.com/

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2025-10-06 17:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-02 19:25 [PATCH v4 0/9] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
2025-10-02 19:25 ` [PATCH v4 1/9] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
2025-10-03  4:24   ` Damien Le Moal
2025-10-06  6:59   ` Hannes Reinecke
2025-10-02 19:25 ` [PATCH v4 2/9] scsi: sd: Do not retry ASC 0x3a in read_capacity_10() with any ASCQ Ewan D. Milne
2025-10-03  4:24   ` Damien Le Moal
2025-10-03 14:40     ` Ewan Milne
2025-10-03 15:44       ` Bart Van Assche
2025-10-03 18:40         ` Ewan Milne
2025-10-06 17:54           ` Bart Van Assche
2025-10-02 19:25 ` [PATCH v4 3/9] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
2025-10-06  2:00   ` Damien Le Moal
2025-10-06  7:01   ` Hannes Reinecke
2025-10-02 19:25 ` [PATCH v4 4/9] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
2025-10-06  2:02   ` Damien Le Moal
2025-10-06  7:06   ` Hannes Reinecke
2025-10-02 19:25 ` [PATCH v4 5/9] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
2025-10-06  2:04   ` Damien Le Moal
2025-10-06  7:07   ` Hannes Reinecke
2025-10-02 19:25 ` [PATCH v4 6/9] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
2025-10-06  2:06   ` Damien Le Moal
2025-10-06  7:10   ` Hannes Reinecke
2025-10-06  7:20     ` Damien Le Moal
2025-10-06  7:25       ` Damien Le Moal
2025-10-06 10:59         ` Hannes Reinecke
2025-10-02 19:25 ` [PATCH v4 7/9] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
2025-10-06  2:07   ` Damien Le Moal
2025-10-06  7:11   ` Hannes Reinecke
2025-10-02 19:25 ` [PATCH v4 8/9] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
2025-10-06  2:09   ` Damien Le Moal
2025-10-06  7:12   ` Hannes Reinecke
2025-10-02 19:25 ` [PATCH v4 9/9] scsi: scsi_debug: Add "only_once" module option to inject an error one time Ewan D. Milne
2025-10-06  2:12   ` Damien Le Moal
2025-10-06  7:12   ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).