linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data
@ 2025-08-15 21:15 Ewan D. Milne
  2025-08-15 21:15 ` [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Ewan D. Milne @ 2025-08-15 21:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

We encountered a SCSI device that responded to the initial READ CAPACITY command
with a good status, but no data was transferred.  This caused a sudden change of
the device capacity to zero when the device was rescanned, for no obvious reason.

This patch series changes read_capacity_10() and read_capacity_16() in sd.c
to retry the command up to 3 times in an attempt to get valid capacity information.
A message is logged if this is ultimately unsuccessful.

There are some predecessor patches, one from a patch in a series by Mike Christie
which changes read_capacity_16() to use the scsi_failures mechanism (which did
not eventually get merged), this makes the changes here much more similar for
both the read_capacity_10 and read_capacity_16() case.  Another patch corrects
a potential use of an uninitialized variable, and a third one removes a check
for -EOVERFLOW that hasn't been needed since commit 72deb455b5ec
("block: remove CONFIG_LBDAF").  Other patches fill in missing .ascq entries
in the scsi_failures array and address other review comments.

The final patch to scsi_debug is allow insertion of the fault to test this change.

Changes in v3:
  - Removed patch to pass the length of the buffer through the sd_read_capacity()
    call chain and adjusted other patches accordingly.  Use RC10/16_LEN for memset()
  - Removed supurfluous parenthesis in conditionals

Changes in v2:
  - Added patches to explicitly specify .ascq in scsi_features usage
  - Pass the length of the buffer used through the sd_read_capacity() call chain
  - Simplify a conditional in scsi_probe_lun() that was requested in similar
    code in read_capacity_16()/read_capacity_10()
  - Changed code in scsi_debug() to make only one call to scsi_set_resid()
  - Moved some declarations around in read_capacity_16()/read_capacity_10()
    and memset() the whole buffer instead of the expected data size
  - Add the newly added flag SDEBUG_NO_DATA to SDEBUG_OPT_ALL_INJECTING

Ewan D. Milne (8):
  scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures
  scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC
    0x3a
  scsi: sd: Have scsi-ml retry read_capacity_16 errors
  scsi: sd: Avoid passing potentially uninitialized "sense_valid" to
    read_capacity_error()
  scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16)
    returning no data
  scsi: Simplify nested if conditional in scsi_probe_lun()
  scsi: scsi_debug: Add option to suppress returned data but return good
    status

 drivers/scsi/scsi_debug.c |  47 ++++++++---
 drivers/scsi/scsi_scan.c  |   7 +-
 drivers/scsi/sd.c         | 167 ++++++++++++++++++++++++++++----------
 3 files changed, 162 insertions(+), 59 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures
  2025-08-15 21:15 [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
@ 2025-08-15 21:15 ` Ewan D. Milne
  2025-08-16  0:36   ` Damien Le Moal
                     ` (2 more replies)
  2025-08-15 21:15 ` [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a Ewan D. Milne
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 31+ messages in thread
From: Ewan D. Milne @ 2025-08-15 21:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

This does not change any behavior (since .ascq was initialized to 0 by
the compiler) but makes explicit that the entry in the scsi_failures
array does not handle cases where ASCQ is nonzero, consistent with other
usage.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/scsi_scan.c | 2 ++
 drivers/scsi/sd.c        | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3c6e089e80c3..c754b1d566e0 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -660,11 +660,13 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 		{
 			.sense = UNIT_ATTENTION,
 			.asc = 0x28,
+			.ascq = 0x00,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
 		{
 			.sense = UNIT_ATTENTION,
 			.asc = 0x29,
+			.ascq = 0x00,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
 		{
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 5b8668accf8e..78f5903cc8d0 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2740,6 +2740,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		{
 			.sense = UNIT_ATTENTION,
 			.asc = 0x29,
+			.ascq = 0x00,
 			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a
  2025-08-15 21:15 [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
  2025-08-15 21:15 ` [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
@ 2025-08-15 21:15 ` Ewan D. Milne
  2025-08-16  0:37   ` Damien Le Moal
                     ` (2 more replies)
  2025-08-15 21:15 ` [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 31+ messages in thread
From: Ewan D. Milne @ 2025-08-15 21:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

This makes the handling in read_capacity_10() consistent with other
cases, e.g. sd_spinup_disk().  Omitting .ascq in scsi_failure did not
result in wildcard matching, it only handled ASCQ 0x00.  This patch
changes the retry behavior, we no longer retry 3 times on ASC 0x3a
if a nonzero ASCQ is ever returned.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 78f5903cc8d0..e3b802b26f0e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2729,11 +2729,13 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		{
 			.sense = UNIT_ATTENTION,
 			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
 		{
 			.sense = NOT_READY,
 			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
 			.result = SAM_STAT_CHECK_CONDITION,
 		},
 		 /* Device reset might occur several times so retry a lot */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-08-15 21:15 [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
  2025-08-15 21:15 ` [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
  2025-08-15 21:15 ` [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a Ewan D. Milne
@ 2025-08-15 21:15 ` Ewan D. Milne
  2025-08-16  0:42   ` Damien Le Moal
                     ` (2 more replies)
  2025-08-15 21:15 ` [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
                   ` (4 subsequent siblings)
  7 siblings, 3 replies; 31+ messages in thread
From: Ewan D. Milne @ 2025-08-15 21:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

This has read_capacity_16 have scsi-ml retry errors instead of driving
them itself.

There are 2 behavior changes with this patch:
1. There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs since the block layer waits/retries for us. For possible
memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
retrying will probably not help.
2. For the specific UAs we checked for and retried, we would get
READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
from the main loop's retries. Each UA now gets
READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
is already 10 and is not based on anything specific like a spec or
device, so the extra 3 we got from the main loop was probably just an
accident and is not going to help.

Original patch by Mike Christie <michael.christie@oracle.com> modified
based upon review comments for an earlier version of this patch.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 107 +++++++++++++++++++++++++++++++---------------
 1 file changed, 73 insertions(+), 34 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e3b802b26f0e..25561d01f972 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2631,14 +2631,66 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
 static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		struct queue_limits *lim, unsigned char *buffer)
 {
-	unsigned char cmd[16];
+	static const u8 cmd[16] = {
+		[0] = SERVICE_ACTION_IN_16,
+		[1] = SAI_READ_CAPACITY_16,
+		[13] = RC16_LEN,
+	};
 	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		/*
+		 * Do not retry Invalid Command Operation Code or Invalid
+		 * Field in CDB.
+		 */
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x20,
+			.ascq = 0x00,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x24,
+			.ascq = 0x00,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Do not retry Medium Not Present */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = 0x3A,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Device reset might occur several times so retry a lot */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.ascq = 0x00,
+			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Any other error not listed above retry 3 times */
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+			.allowed = 3,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.failures = &failures,
 	};
 	int sense_valid = 0;
 	int the_result;
-	int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
 	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
@@ -2646,40 +2698,27 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	if (sdp->no_read_capacity_16)
 		return -EINVAL;
 
-	do {
-		memset(cmd, 0, 16);
-		cmd[0] = SERVICE_ACTION_IN_16;
-		cmd[1] = SAI_READ_CAPACITY_16;
-		cmd[13] = RC16_LEN;
-		memset(buffer, 0, RC16_LEN);
-
-		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
-					      buffer, RC16_LEN, SD_TIMEOUT,
-					      sdkp->max_retries, &exec_args);
-		if (the_result > 0) {
-			if (media_not_present(sdkp, &sshdr))
-				return -ENODEV;
+	memset(buffer, 0, RC16_LEN);
 
-			sense_valid = scsi_sense_valid(&sshdr);
-			if (sense_valid &&
-			    sshdr.sense_key == ILLEGAL_REQUEST &&
-			    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
-			    sshdr.ascq == 0x00)
-				/* Invalid Command Operation Code or
-				 * Invalid Field in CDB, just retry
-				 * silently with RC10 */
-				return -EINVAL;
-			if (sense_valid &&
-			    sshdr.sense_key == UNIT_ATTENTION &&
-			    sshdr.asc == 0x29 && sshdr.ascq == 0x00)
-				/* Device reset might occur several times,
-				 * give it one more chance */
-				if (--reset_retries > 0)
-					continue;
-		}
-		retries--;
+	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
+				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
+				      &exec_args);
 
-	} while (the_result && retries);
+	if (the_result > 0) {
+		if (media_not_present(sdkp, &sshdr))
+			return -ENODEV;
+
+		sense_valid = scsi_sense_valid(&sshdr);
+		if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST &&
+		    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
+		     sshdr.ascq == 0x00) {
+			/*
+			 * Invalid Command Operation Code or Invalid Field in
+			 * CDB, just retry silently with RC10
+			 */
+			return -EINVAL;
+		}
+	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error()
  2025-08-15 21:15 [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (2 preceding siblings ...)
  2025-08-15 21:15 ` [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
@ 2025-08-15 21:15 ` Ewan D. Milne
  2025-08-16  0:44   ` Damien Le Moal
                     ` (2 more replies)
  2025-08-15 21:15 ` [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 31+ messages in thread
From: Ewan D. Milne @ 2025-08-15 21:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

read_capacity_10() sets "sense_valid" in a different conditional statement prior to
calling read_capacity_error(), and does not use this value otherwise.  Move the call
to scsi_sense_valid() to read_capacity_error() instead of passing it as a parameter
from read_capacity_16() and read_capacity_10().

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 25561d01f972..d465609a66e3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2597,9 +2597,10 @@ static void sd_config_protection(struct scsi_disk *sdkp,
 }
 
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
-			struct scsi_sense_hdr *sshdr, int sense_valid,
-			int the_result)
+				struct scsi_sense_hdr *sshdr, int the_result)
 {
+	bool sense_valid = scsi_sense_valid(sshdr);
+
 	if (sense_valid)
 		sd_print_sense_hdr(sdkp, sshdr);
 	else
@@ -2722,7 +2723,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);
-		read_capacity_error(sdkp, sdp, &sshdr, sense_valid, the_result);
+		read_capacity_error(sdkp, sdp, &sshdr, the_result);
 		return -EINVAL;
 	}
 
@@ -2799,7 +2800,6 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		.sshdr = &sshdr,
 		.failures = &failures,
 	};
-	int sense_valid = 0;
 	int the_result;
 	sector_t lba;
 	unsigned sector_size;
@@ -2811,15 +2811,13 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 				      &exec_args);
 
 	if (the_result > 0) {
-		sense_valid = scsi_sense_valid(&sshdr);
-
 		if (media_not_present(sdkp, &sshdr))
 			return -ENODEV;
 	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(10) failed", the_result);
-		read_capacity_error(sdkp, sdp, &sshdr, sense_valid, the_result);
+		read_capacity_error(sdkp, sdp, &sshdr, the_result);
 		return -EINVAL;
 	}
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  2025-08-15 21:15 [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (3 preceding siblings ...)
  2025-08-15 21:15 ` [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
@ 2025-08-15 21:15 ` Ewan D. Milne
  2025-08-16  0:45   ` Damien Le Moal
                     ` (2 more replies)
  2025-08-15 21:15 ` [PATCH v3 6/8] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
                   ` (2 subsequent siblings)
  7 siblings, 3 replies; 31+ messages in thread
From: Ewan D. Milne @ 2025-08-15 21:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

Remove checks for -EOVERFLOW in sd_read_capacity() because this value has not
been returned to it since commit 72deb455b5ec ("block: remove CONFIG_LBDAF").

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d465609a66e3..acd79e9a0d82 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2863,8 +2863,6 @@ sd_read_capacity(struct scsi_disk *sdkp, struct queue_limits *lim,
 
 	if (sd_try_rc16_first(sdp)) {
 		sector_size = read_capacity_16(sdkp, sdp, lim, buffer);
-		if (sector_size == -EOVERFLOW)
-			goto got_data;
 		if (sector_size == -ENODEV)
 			return;
 		if (sector_size < 0)
@@ -2873,8 +2871,6 @@ sd_read_capacity(struct scsi_disk *sdkp, struct queue_limits *lim,
 			return;
 	} else {
 		sector_size = read_capacity_10(sdkp, sdp, buffer);
-		if (sector_size == -EOVERFLOW)
-			goto got_data;
 		if (sector_size < 0)
 			return;
 		if ((sizeof(sdkp->capacity) > 4) &&
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 6/8] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-08-15 21:15 [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (4 preceding siblings ...)
  2025-08-15 21:15 ` [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
@ 2025-08-15 21:15 ` Ewan D. Milne
  2025-08-16  0:53   ` Damien Le Moal
  2025-08-20 11:44   ` Hannes Reinecke
  2025-08-15 21:15 ` [PATCH v3 7/8] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
  2025-08-15 21:15 ` [PATCH v3 8/8] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
  7 siblings, 2 replies; 31+ messages in thread
From: Ewan D. Milne @ 2025-08-15 21:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

sd_read_capacity_10() and sd_read_capacity_16() do not check for underflow
and can extract invalid (e.g. zero) data when a malfunctioning device does
not actually transfer any data, but returnes a good status otherwise.
Check for this and retry, and log a message and return -EINVAL if we can't
get the capacity information.

We encountered a device that did this once but returned good data afterwards.

See similar commit 5cd3bbfad088 ("[SCSI] retry with missing data for INQUIRY")

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 61 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index acd79e9a0d82..6066f5c92c74 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2638,6 +2638,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		[13] = RC16_LEN,
 	};
 	struct scsi_sense_hdr sshdr;
+	int count, resid;
 	struct scsi_failure failure_defs[] = {
 		/*
 		 * Do not retry Invalid Command Operation Code or Invalid
@@ -2688,6 +2689,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.resid = &resid,
 		.failures = &failures,
 	};
 	int sense_valid = 0;
@@ -2699,11 +2701,23 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	if (sdp->no_read_capacity_16)
 		return -EINVAL;
 
-	memset(buffer, 0, RC16_LEN);
+	for (count = 0; count < 3; ++count) {
+		memset(buffer, 0, RC16_LEN);
 
-	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
-				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
-				      &exec_args);
+		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
+					      buffer, RC16_LEN, SD_TIMEOUT,
+					      sdkp->max_retries, &exec_args);
+
+		if (the_result == 0 && resid == RC16_LEN) {
+			/*
+			 * if nothing was transferred, we try
+			 * again. It's a workaround for a broken
+			 * device.
+			 */
+			continue;
+		}
+		break;
+	}
 
 	if (the_result > 0) {
 		if (media_not_present(sdkp, &sshdr))
@@ -2727,6 +2741,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		return -EINVAL;
 	}
 
+	if (resid == RC16_LEN) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Read Capacity(16) returned good status but no data");
+		return -EINVAL;
+	}
+
 	sector_size = get_unaligned_be32(&buffer[8]);
 	lba = get_unaligned_be64(&buffer[0]);
 
@@ -2759,11 +2779,17 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	return sector_size;
 }
 
+#define RC10_LEN 8
+#if RC10_LEN > SD_BUF_SIZE
+#error RC10_LEN must not be more than SD_BUF_SIZE
+#endif
+
 static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 						unsigned char *buffer)
 {
 	static const u8 cmd[10] = { READ_CAPACITY };
 	struct scsi_sense_hdr sshdr;
+	int count, resid;
 	struct scsi_failure failure_defs[] = {
 		/* Do not retry Medium Not Present */
 		{
@@ -2798,17 +2824,30 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.resid = &resid,
 		.failures = &failures,
 	};
 	int the_result;
 	sector_t lba;
 	unsigned sector_size;
 
-	memset(buffer, 0, 8);
+	for (count = 0; count < 3; ++count) {
+		memset(buffer, 0, RC10_LEN);
 
-	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
-				      8, SD_TIMEOUT, sdkp->max_retries,
-				      &exec_args);
+		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
+					      buffer, RC10_LEN, SD_TIMEOUT,
+					      sdkp->max_retries, &exec_args);
+
+		if (the_result == 0 && resid == RC10_LEN) {
+			/*
+			 * if nothing was transferred, we try
+			 * again. It's a workaround for a broken
+			 * device.
+			 */
+			continue;
+		}
+		break;
+	}
 
 	if (the_result > 0) {
 		if (media_not_present(sdkp, &sshdr))
@@ -2821,6 +2860,12 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		return -EINVAL;
 	}
 
+	if (resid == RC10_LEN) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Read Capacity(10) returned good status but no data");
+		return -EINVAL;
+	}
+
 	sector_size = get_unaligned_be32(&buffer[4]);
 	lba = get_unaligned_be32(&buffer[0]);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 7/8] scsi: Simplify nested if conditional in scsi_probe_lun()
  2025-08-15 21:15 [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (5 preceding siblings ...)
  2025-08-15 21:15 ` [PATCH v3 6/8] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
@ 2025-08-15 21:15 ` Ewan D. Milne
  2025-08-16  0:56   ` Damien Le Moal
  2025-08-20 11:44   ` Hannes Reinecke
  2025-08-15 21:15 ` [PATCH v3 8/8] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
  7 siblings, 2 replies; 31+ messages in thread
From: Ewan D. Milne @ 2025-08-15 21:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

Make code congruent with similar code in read_capacity_16()/read_capacity_10().

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/scsi_scan.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index c754b1d566e0..9527b8fc5262 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -717,14 +717,13 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 				"scsi scan: INQUIRY %s with code 0x%x\n",
 				result ? "failed" : "successful", result));
 
-		if (result == 0) {
+		if (result == 0 && resid == try_inquiry_len) {
 			/*
 			 * if nothing was transferred, we try
 			 * again. It's a workaround for some USB
 			 * devices.
 			 */
-			if (resid == try_inquiry_len)
-				continue;
+			continue;
 		}
 		break;
 	}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 8/8] scsi: scsi_debug: Add option to suppress returned data but return good status
  2025-08-15 21:15 [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (6 preceding siblings ...)
  2025-08-15 21:15 ` [PATCH v3 7/8] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
@ 2025-08-15 21:15 ` Ewan D. Milne
  2025-08-16  0:59   ` Damien Le Moal
  7 siblings, 1 reply; 31+ messages in thread
From: Ewan D. Milne @ 2025-08-15 21:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

This is used to test the earlier read_capacity_10()/16() retry patch.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/scsi_debug.c | 47 ++++++++++++++++++++++++++++-----------
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 353cb60e1abe..6239783bef21 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -230,6 +230,7 @@ struct tape_block {
 #define SDEBUG_OPT_NO_CDB_NOISE		0x4000
 #define SDEBUG_OPT_HOST_BUSY		0x8000
 #define SDEBUG_OPT_CMD_ABORT		0x10000
+#define SDEBUG_OPT_NO_DATA		0x20000
 #define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \
 			      SDEBUG_OPT_RESET_NOISE)
 #define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \
@@ -237,7 +238,8 @@ struct tape_block {
 				  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR | \
 				  SDEBUG_OPT_SHORT_TRANSFER | \
 				  SDEBUG_OPT_HOST_BUSY | \
-				  SDEBUG_OPT_CMD_ABORT)
+				  SDEBUG_OPT_CMD_ABORT | \
+				  SDEBUG_OPT_NO_DATA)
 #define SDEBUG_OPT_RECOV_DIF_DIX (SDEBUG_OPT_RECOVERED_ERR | \
 				  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR)
 
@@ -1633,7 +1635,7 @@ static int make_ua(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 static int fill_from_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr,
 				int arr_len)
 {
-	int act_len;
+	int act_len, resid;
 	struct scsi_data_buffer *sdb = &scp->sdb;
 
 	if (!sdb->length)
@@ -1641,9 +1643,18 @@ static int fill_from_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr,
 	if (scp->sc_data_direction != DMA_FROM_DEVICE)
 		return DID_ERROR << 16;
 
-	act_len = sg_copy_from_buffer(sdb->table.sgl, sdb->table.nents,
-				      arr, arr_len);
-	scsi_set_resid(scp, scsi_bufflen(scp) - act_len);
+	/*
+	 * Conditionally suppress DATA IN transfer and leave resid set to bufflen.
+	 */
+	if (unlikely((sdebug_opts & SDEBUG_OPT_NO_DATA) &&
+		      atomic_read(&sdeb_inject_pending))) {
+		resid = scsi_bufflen(scp);
+	} else {
+		act_len = sg_copy_from_buffer(sdb->table.sgl, sdb->table.nents,
+					      arr, arr_len);
+		resid = scsi_bufflen(scp) - act_len;
+	}
+	scsi_set_resid(scp, resid);
 
 	return 0;
 }
@@ -1656,7 +1667,7 @@ static int fill_from_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr,
 static int p_fill_from_dev_buffer(struct scsi_cmnd *scp, const void *arr,
 				  int arr_len, unsigned int off_dst)
 {
-	unsigned int act_len, n;
+	unsigned int act_len, n, resid;
 	struct scsi_data_buffer *sdb = &scp->sdb;
 	off_t skip = off_dst;
 
@@ -1665,13 +1676,23 @@ static int p_fill_from_dev_buffer(struct scsi_cmnd *scp, const void *arr,
 	if (scp->sc_data_direction != DMA_FROM_DEVICE)
 		return DID_ERROR << 16;
 
-	act_len = sg_pcopy_from_buffer(sdb->table.sgl, sdb->table.nents,
-				       arr, arr_len, skip);
-	pr_debug("%s: off_dst=%u, scsi_bufflen=%u, act_len=%u, resid=%d\n",
-		 __func__, off_dst, scsi_bufflen(scp), act_len,
-		 scsi_get_resid(scp));
-	n = scsi_bufflen(scp) - (off_dst + act_len);
-	scsi_set_resid(scp, min_t(u32, scsi_get_resid(scp), n));
+	/*
+	 * Conditionally suppress DATA IN transfer and leave resid set to bufflen.
+	 */
+	if (unlikely((sdebug_opts & SDEBUG_OPT_NO_DATA) &&
+		      atomic_read(&sdeb_inject_pending))) {
+		resid = scsi_bufflen(scp);
+	} else {
+		act_len = sg_pcopy_from_buffer(sdb->table.sgl, sdb->table.nents,
+					       arr, arr_len, skip);
+		pr_debug("%s: off_dst=%u, scsi_bufflen=%u, act_len=%u, resid=%d\n",
+			 __func__, off_dst, scsi_bufflen(scp), act_len,
+			 scsi_get_resid(scp));
+		n = scsi_bufflen(scp) - (off_dst + act_len);
+		resid = min_t(u32, scsi_get_resid(scp), n);
+	}
+	scsi_set_resid(scp, resid);
+
 	return 0;
 }
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures
  2025-08-15 21:15 ` [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
@ 2025-08-16  0:36   ` Damien Le Moal
  2025-08-19 19:28   ` Bart Van Assche
  2025-08-20 11:39   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2025-08-16  0:36 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche

On 8/16/25 06:15, Ewan D. Milne wrote:
> This does not change any behavior (since .ascq was initialized to 0 by
> the compiler) but makes explicit that the entry in the scsi_failures
> array does not handle cases where ASCQ is nonzero, consistent with other
> usage.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Looks goo to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a
  2025-08-15 21:15 ` [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a Ewan D. Milne
@ 2025-08-16  0:37   ` Damien Le Moal
  2025-08-19 19:34   ` Bart Van Assche
  2025-08-20 11:39   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2025-08-16  0:37 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche

On 8/16/25 06:15, Ewan D. Milne wrote:
> This makes the handling in read_capacity_10() consistent with other
> cases, e.g. sd_spinup_disk().  Omitting .ascq in scsi_failure did not
> result in wildcard matching, it only handled ASCQ 0x00.  This patch
> changes the retry behavior, we no longer retry 3 times on ASC 0x3a
> if a nonzero ASCQ is ever returned.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Looks OK.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-08-15 21:15 ` [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
@ 2025-08-16  0:42   ` Damien Le Moal
  2025-08-19 19:37   ` Bart Van Assche
  2025-08-20 11:40   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2025-08-16  0:42 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche

On 8/16/25 06:15, Ewan D. Milne wrote:
> This has read_capacity_16 have scsi-ml retry errors instead of driving
> them itself.
> 
> There are 2 behavior changes with this patch:
> 1. There is one behavior change where we no longer retry when
> scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
> for failures like the queue being removed, and for the case where there
> are no tags/reqs since the block layer waits/retries for us. For possible
> memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
> retrying will probably not help.
> 2. For the specific UAs we checked for and retried, we would get
> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
> from the main loop's retries. Each UA now gets
> READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
> retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
> is already 10 and is not based on anything specific like a spec or
> device, so the extra 3 we got from the main loop was probably just an
> accident and is not going to help.
> 
> Original patch by Mike Christie <michael.christie@oracle.com> modified
> based upon review comments for an earlier version of this patch.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Looks OK.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error()
  2025-08-15 21:15 ` [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
@ 2025-08-16  0:44   ` Damien Le Moal
  2025-08-19 19:38   ` Bart Van Assche
  2025-08-20 11:41   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2025-08-16  0:44 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche

On 8/16/25 06:15, Ewan D. Milne wrote:
> read_capacity_10() sets "sense_valid" in a different conditional statement prior to
> calling read_capacity_error(), and does not use this value otherwise.  Move the call
> to scsi_sense_valid() to read_capacity_error() instead of passing it as a parameter
> from read_capacity_16() and read_capacity_10().
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Does this maybe need a Fixes tag ?

Regardless, looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  2025-08-15 21:15 ` [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
@ 2025-08-16  0:45   ` Damien Le Moal
  2025-08-19 19:38   ` Bart Van Assche
  2025-08-20 11:41   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2025-08-16  0:45 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche

On 8/16/25 06:15, Ewan D. Milne wrote:
> Remove checks for -EOVERFLOW in sd_read_capacity() because this value has not
> been returned to it since commit 72deb455b5ec ("block: remove CONFIG_LBDAF").
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 6/8] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-08-15 21:15 ` [PATCH v3 6/8] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
@ 2025-08-16  0:53   ` Damien Le Moal
  2025-08-19 19:40     ` Bart Van Assche
  2025-08-20 11:44   ` Hannes Reinecke
  1 sibling, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2025-08-16  0:53 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche

On 8/16/25 06:15, Ewan D. Milne wrote:
> sd_read_capacity_10() and sd_read_capacity_16() do not check for underflow
> and can extract invalid (e.g. zero) data when a malfunctioning device does
> not actually transfer any data, but returnes a good status otherwise.

s/returnes/returns

> Check for this and retry, and log a message and return -EINVAL if we can't
> get the capacity information.

Hmmm. A little unclear explanation: "and retry, and log a message and return
-EINVAL"... hard to parse.

> 
> We encountered a device that did this once but returned good data afterwards.
> 
> See similar commit 5cd3bbfad088 ("[SCSI] retry with missing data for INQUIRY")
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

A couple of nits below to make the code cleaner.

With that,

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> ---
>  drivers/scsi/sd.c | 61 ++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 53 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index acd79e9a0d82..6066f5c92c74 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2638,6 +2638,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  		[13] = RC16_LEN,
>  	};
>  	struct scsi_sense_hdr sshdr;
> +	int count, resid;
>  	struct scsi_failure failure_defs[] = {
>  		/*
>  		 * Do not retry Invalid Command Operation Code or Invalid
> @@ -2688,6 +2689,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  	};
>  	const struct scsi_exec_args exec_args = {
>  		.sshdr = &sshdr,
> +		.resid = &resid,
>  		.failures = &failures,
>  	};
>  	int sense_valid = 0;
> @@ -2699,11 +2701,23 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  	if (sdp->no_read_capacity_16)
>  		return -EINVAL;
>  
> -	memset(buffer, 0, RC16_LEN);
> +	for (count = 0; count < 3; ++count) {
> +		memset(buffer, 0, RC16_LEN);
>  
> -	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
> -				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
> -				      &exec_args);
> +		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
> +					      buffer, RC16_LEN, SD_TIMEOUT,
> +					      sdkp->max_retries, &exec_args);
> +
> +		if (the_result == 0 && resid == RC16_LEN) {
> +			/*
> +			 * if nothing was transferred, we try
> +			 * again. It's a workaround for a broken
> +			 * device.
> +			 */
> +			continue;
> +		}
> +		break;

Maybe reverse the condition to avoid this break and the continue ? E.g.:

		/*
		 * If nothing was transferred, we try again. It is a workaround
		 * for some buggy devices or SAT which sometimes do not return
		 * data on the first try.
		 */
		if (the_result || resid != RC16_LEN)
			break;

I find this simpler and cleaner :)

> +	}
>  
>  	if (the_result > 0) {
>  		if (media_not_present(sdkp, &sshdr))
> @@ -2727,6 +2741,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  		return -EINVAL;
>  	}
>  
> +	if (resid == RC16_LEN) {
> +		sd_printk(KERN_ERR, sdkp,
> +			  "Read Capacity(16) returned good status but no data");
> +		return -EINVAL;
> +	}
> +
>  	sector_size = get_unaligned_be32(&buffer[8]);
>  	lba = get_unaligned_be64(&buffer[0]);
>  

[...]

> @@ -2798,17 +2824,30 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  	};
>  	const struct scsi_exec_args exec_args = {
>  		.sshdr = &sshdr,
> +		.resid = &resid,
>  		.failures = &failures,
>  	};
>  	int the_result;
>  	sector_t lba;
>  	unsigned sector_size;
>  
> -	memset(buffer, 0, 8);
> +	for (count = 0; count < 3; ++count) {
> +		memset(buffer, 0, RC10_LEN);
>  
> -	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
> -				      8, SD_TIMEOUT, sdkp->max_retries,
> -				      &exec_args);
> +		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
> +					      buffer, RC10_LEN, SD_TIMEOUT,
> +					      sdkp->max_retries, &exec_args);
> +
> +		if (the_result == 0 && resid == RC10_LEN) {
> +			/*
> +			 * if nothing was transferred, we try
> +			 * again. It's a workaround for a broken
> +			 * device.
> +			 */
> +			continue;
> +		}
> +		break;

Same suggestion here as above.

> +	}
>  
>  	if (the_result > 0) {
>  		if (media_not_present(sdkp, &sshdr))
> @@ -2821,6 +2860,12 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  		return -EINVAL;
>  	}
>  
> +	if (resid == RC10_LEN) {
> +		sd_printk(KERN_ERR, sdkp,
> +			  "Read Capacity(10) returned good status but no data");
> +		return -EINVAL;
> +	}
> +
>  	sector_size = get_unaligned_be32(&buffer[4]);
>  	lba = get_unaligned_be32(&buffer[0]);
>  


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] scsi: Simplify nested if conditional in scsi_probe_lun()
  2025-08-15 21:15 ` [PATCH v3 7/8] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
@ 2025-08-16  0:56   ` Damien Le Moal
  2025-08-20 11:44   ` Hannes Reinecke
  1 sibling, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2025-08-16  0:56 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche

On 8/16/25 06:15, Ewan D. Milne wrote:
> Make code congruent with similar code in read_capacity_16()/read_capacity_10().
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>  drivers/scsi/scsi_scan.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index c754b1d566e0..9527b8fc5262 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -717,14 +717,13 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
>  				"scsi scan: INQUIRY %s with code 0x%x\n",
>  				result ? "failed" : "successful", result));
>  
> -		if (result == 0) {
> +		if (result == 0 && resid == try_inquiry_len) {
>  			/*
>  			 * if nothing was transferred, we try
>  			 * again. It's a workaround for some USB
>  			 * devices.
>  			 */
> -			if (resid == try_inquiry_len)
> -				continue;
> +			continue;
>  		}

Maybe make this:

		/*
		 * If nothing was transferred, we try again. It is a workaround
		 * for some buggy USB devices.
		 */
		if (result == 0 && resid == try_inquiry_len)
			continue;

With that,

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/8] scsi: scsi_debug: Add option to suppress returned data but return good status
  2025-08-15 21:15 ` [PATCH v3 8/8] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
@ 2025-08-16  0:59   ` Damien Le Moal
  2025-08-20 11:45     ` Hannes Reinecke
  0 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2025-08-16  0:59 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche

On 8/16/25 06:15, Ewan D. Milne wrote:
> This is used to test the earlier read_capacity_10()/16() retry patch.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Looks OK, but it would be nice to be able to suppress the data only for the
first X commands, so that the retires can be exercised with a success in them
instead of all of them failing to give data.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures
  2025-08-15 21:15 ` [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
  2025-08-16  0:36   ` Damien Le Moal
@ 2025-08-19 19:28   ` Bart Van Assche
  2025-08-20 11:39   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2025-08-19 19:28 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, dlemoal

On 8/15/25 2:15 PM, Ewan D. Milne wrote:
> This does not change any behavior (since .ascq was initialized to 0 by
> the compiler) but makes explicit that the entry in the scsi_failures
> array does not handle cases where ASCQ is nonzero, consistent with other
> usage.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a
  2025-08-15 21:15 ` [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a Ewan D. Milne
  2025-08-16  0:37   ` Damien Le Moal
@ 2025-08-19 19:34   ` Bart Van Assche
  2025-08-20 11:39   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2025-08-19 19:34 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, dlemoal

On 8/15/25 2:15 PM, Ewan D. Milne wrote:
> This makes the handling in read_capacity_10() consistent with other
> cases, e.g. sd_spinup_disk().  Omitting .ascq in scsi_failure did not
> result in wildcard matching, it only handled ASCQ 0x00.  This patch
> changes the retry behavior, we no longer retry 3 times on ASC 0x3a
> if a nonzero ASCQ is ever returned.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 78f5903cc8d0..e3b802b26f0e 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2729,11 +2729,13 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   		{
>   			.sense = UNIT_ATTENTION,
>   			.asc = 0x3A,
> +			.ascq = SCMD_FAILURE_ASCQ_ANY,
>   			.result = SAM_STAT_CHECK_CONDITION,
>   		},
>   		{
>   			.sense = NOT_READY,
>   			.asc = 0x3A,
> +			.ascq = SCMD_FAILURE_ASCQ_ANY,
>   			.result = SAM_STAT_CHECK_CONDITION,
>   		},
>   		 /* Device reset might occur several times so retry a lot */

If this patch is reposted, please consider shortening the title of this 
patch to e.g. "Do not retry ASC 0x3a". Anyway:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-08-15 21:15 ` [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
  2025-08-16  0:42   ` Damien Le Moal
@ 2025-08-19 19:37   ` Bart Van Assche
  2025-08-20 11:40   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2025-08-19 19:37 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, dlemoal

On 8/15/25 2:15 PM, Ewan D. Milne wrote:
> This has read_capacity_16 have scsi-ml retry errors instead of driving
> them itself.
> 
> There are 2 behavior changes with this patch:
> 1. There is one behavior change where we no longer retry when
> scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
> for failures like the queue being removed, and for the case where there
> are no tags/reqs since the block layer waits/retries for us. For possible
> memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
> retrying will probably not help.
> 2. For the specific UAs we checked for and retried, we would get
> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
> from the main loop's retries. Each UA now gets
> READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
> retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
> is already 10 and is not based on anything specific like a spec or
> device, so the extra 3 we got from the main loop was probably just an
> accident and is not going to help.
> 
> Original patch by Mike Christie <michael.christie@oracle.com> modified
> based upon review comments for an earlier version of this patch.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error()
  2025-08-15 21:15 ` [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
  2025-08-16  0:44   ` Damien Le Moal
@ 2025-08-19 19:38   ` Bart Van Assche
  2025-08-20 11:41   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2025-08-19 19:38 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, dlemoal

On 8/15/25 2:15 PM, Ewan D. Milne wrote:
> read_capacity_10() sets "sense_valid" in a different conditional statement prior to
> calling read_capacity_error(), and does not use this value otherwise.  Move the call
> to scsi_sense_valid() to read_capacity_error() instead of passing it as a parameter
> from read_capacity_16() and read_capacity_10().
Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  2025-08-15 21:15 ` [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
  2025-08-16  0:45   ` Damien Le Moal
@ 2025-08-19 19:38   ` Bart Van Assche
  2025-08-20 11:41   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2025-08-19 19:38 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, dlemoal

On 8/15/25 2:15 PM, Ewan D. Milne wrote:
> Remove checks for -EOVERFLOW in sd_read_capacity() because this value has not
> been returned to it since commit 72deb455b5ec ("block: remove CONFIG_LBDAF").

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 6/8] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-08-16  0:53   ` Damien Le Moal
@ 2025-08-19 19:40     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2025-08-19 19:40 UTC (permalink / raw)
  To: Damien Le Moal, Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert

On 8/15/25 5:53 PM, Damien Le Moal wrote:
>> +		if (the_result == 0 && resid == RC16_LEN) {
>> +			/*
>> +			 * if nothing was transferred, we try
>> +			 * again. It's a workaround for a broken
>> +			 * device.
>> +			 */
>> +			continue;
>> +		}
>> +		break;
> 
> Maybe reverse the condition to avoid this break and the continue ? E.g.:
> 
> 		/*
> 		 * If nothing was transferred, we try again. It is a workaround
> 		 * for some buggy devices or SAT which sometimes do not return
> 		 * data on the first try.
> 		 */
> 		if (the_result || resid != RC16_LEN)
> 			break;
> 
> I find this simpler and cleaner :)

+1

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures
  2025-08-15 21:15 ` [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
  2025-08-16  0:36   ` Damien Le Moal
  2025-08-19 19:28   ` Bart Van Assche
@ 2025-08-20 11:39   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2025-08-20 11:39 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 8/15/25 23:15, Ewan D. Milne wrote:
> This does not change any behavior (since .ascq was initialized to 0 by
> the compiler) but makes explicit that the entry in the scsi_failures
> array does not handle cases where ASCQ is nonzero, consistent with other
> usage.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/scsi_scan.c | 2 ++
>   drivers/scsi/sd.c        | 1 +
>   2 files changed, 3 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a
  2025-08-15 21:15 ` [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a Ewan D. Milne
  2025-08-16  0:37   ` Damien Le Moal
  2025-08-19 19:34   ` Bart Van Assche
@ 2025-08-20 11:39   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2025-08-20 11:39 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 8/15/25 23:15, Ewan D. Milne wrote:
> This makes the handling in read_capacity_10() consistent with other
> cases, e.g. sd_spinup_disk().  Omitting .ascq in scsi_failure did not
> result in wildcard matching, it only handled ASCQ 0x00.  This patch
> changes the retry behavior, we no longer retry 3 times on ASC 0x3a
> if a nonzero ASCQ is ever returned.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-08-15 21:15 ` [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
  2025-08-16  0:42   ` Damien Le Moal
  2025-08-19 19:37   ` Bart Van Assche
@ 2025-08-20 11:40   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2025-08-20 11:40 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 8/15/25 23:15, Ewan D. Milne wrote:
> This has read_capacity_16 have scsi-ml retry errors instead of driving
> them itself.
> 
> There are 2 behavior changes with this patch:
> 1. There is one behavior change where we no longer retry when
> scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
> for failures like the queue being removed, and for the case where there
> are no tags/reqs since the block layer waits/retries for us. For possible
> memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
> retrying will probably not help.
> 2. For the specific UAs we checked for and retried, we would get
> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
> from the main loop's retries. Each UA now gets
> READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
> retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
> is already 10 and is not based on anything specific like a spec or
> device, so the extra 3 we got from the main loop was probably just an
> accident and is not going to help.
> 
> Original patch by Mike Christie <michael.christie@oracle.com> modified
> based upon review comments for an earlier version of this patch.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 107 +++++++++++++++++++++++++++++++---------------
>   1 file changed, 73 insertions(+), 34 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error()
  2025-08-15 21:15 ` [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
  2025-08-16  0:44   ` Damien Le Moal
  2025-08-19 19:38   ` Bart Van Assche
@ 2025-08-20 11:41   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2025-08-20 11:41 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 8/15/25 23:15, Ewan D. Milne wrote:
> read_capacity_10() sets "sense_valid" in a different conditional statement prior to
> calling read_capacity_error(), and does not use this value otherwise.  Move the call
> to scsi_sense_valid() to read_capacity_error() instead of passing it as a parameter
> from read_capacity_16() and read_capacity_10().
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 12 +++++-------
>   1 file changed, 5 insertions(+), 7 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  2025-08-15 21:15 ` [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
  2025-08-16  0:45   ` Damien Le Moal
  2025-08-19 19:38   ` Bart Van Assche
@ 2025-08-20 11:41   ` Hannes Reinecke
  2 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2025-08-20 11:41 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 8/15/25 23:15, Ewan D. Milne wrote:
> Remove checks for -EOVERFLOW in sd_read_capacity() because this value has not
> been returned to it since commit 72deb455b5ec ("block: remove CONFIG_LBDAF").
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 6/8] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-08-15 21:15 ` [PATCH v3 6/8] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
  2025-08-16  0:53   ` Damien Le Moal
@ 2025-08-20 11:44   ` Hannes Reinecke
  1 sibling, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2025-08-20 11:44 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 8/15/25 23:15, Ewan D. Milne wrote:
> sd_read_capacity_10() and sd_read_capacity_16() do not check for underflow
> and can extract invalid (e.g. zero) data when a malfunctioning device does
> not actually transfer any data, but returnes a good status otherwise.
> Check for this and retry, and log a message and return -EINVAL if we can't
> get the capacity information.
> 
> We encountered a device that did this once but returned good data afterwards.
> 
> See similar commit 5cd3bbfad088 ("[SCSI] retry with missing data for INQUIRY")
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/sd.c | 61 ++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 53 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] scsi: Simplify nested if conditional in scsi_probe_lun()
  2025-08-15 21:15 ` [PATCH v3 7/8] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
  2025-08-16  0:56   ` Damien Le Moal
@ 2025-08-20 11:44   ` Hannes Reinecke
  1 sibling, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2025-08-20 11:44 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert, bvanassche, dlemoal

On 8/15/25 23:15, Ewan D. Milne wrote:
> Make code congruent with similar code in read_capacity_16()/read_capacity_10().
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/scsi_scan.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/8] scsi: scsi_debug: Add option to suppress returned data but return good status
  2025-08-16  0:59   ` Damien Le Moal
@ 2025-08-20 11:45     ` Hannes Reinecke
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2025-08-20 11:45 UTC (permalink / raw)
  To: Damien Le Moal, Ewan D. Milne, linux-scsi
  Cc: michael.christie, dgilbert, bvanassche

On 8/16/25 02:59, Damien Le Moal wrote:
> On 8/16/25 06:15, Ewan D. Milne wrote:
>> This is used to test the earlier read_capacity_10()/16() retry patch.
>>
>> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> 
> Looks OK, but it would be nice to be able to suppress the data only for the
> first X commands, so that the retires can be exercised with a success in them
> instead of all of them failing to give data.
> 
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> 
> 
Seconded. The patches expect the device to return valid data after
retry, so we should be checking for that scenario in scsi_debug, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-08-20 11:45 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15 21:15 [PATCH v3 0/8] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
2025-08-15 21:15 ` [PATCH v3 1/8] scsi: Explicitly specify .ascq = 0x00 for ASC 0x28/0x29 scsi_failures Ewan D. Milne
2025-08-16  0:36   ` Damien Le Moal
2025-08-19 19:28   ` Bart Van Assche
2025-08-20 11:39   ` Hannes Reinecke
2025-08-15 21:15 ` [PATCH v3 2/8] scsi: sd: Explicitly specify .ascq = SCMD_FAILURE_ASCQ_ANY for ASC 0x3a Ewan D. Milne
2025-08-16  0:37   ` Damien Le Moal
2025-08-19 19:34   ` Bart Van Assche
2025-08-20 11:39   ` Hannes Reinecke
2025-08-15 21:15 ` [PATCH v3 3/8] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
2025-08-16  0:42   ` Damien Le Moal
2025-08-19 19:37   ` Bart Van Assche
2025-08-20 11:40   ` Hannes Reinecke
2025-08-15 21:15 ` [PATCH v3 4/8] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
2025-08-16  0:44   ` Damien Le Moal
2025-08-19 19:38   ` Bart Van Assche
2025-08-20 11:41   ` Hannes Reinecke
2025-08-15 21:15 ` [PATCH v3 5/8] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
2025-08-16  0:45   ` Damien Le Moal
2025-08-19 19:38   ` Bart Van Assche
2025-08-20 11:41   ` Hannes Reinecke
2025-08-15 21:15 ` [PATCH v3 6/8] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
2025-08-16  0:53   ` Damien Le Moal
2025-08-19 19:40     ` Bart Van Assche
2025-08-20 11:44   ` Hannes Reinecke
2025-08-15 21:15 ` [PATCH v3 7/8] scsi: Simplify nested if conditional in scsi_probe_lun() Ewan D. Milne
2025-08-16  0:56   ` Damien Le Moal
2025-08-20 11:44   ` Hannes Reinecke
2025-08-15 21:15 ` [PATCH v3 8/8] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
2025-08-16  0:59   ` Damien Le Moal
2025-08-20 11:45     ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).