linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data
@ 2025-07-16 18:48 Ewan D. Milne
  2025-07-16 18:48 ` [PATCH 1/5] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Ewan D. Milne @ 2025-07-16 18:48 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert

We encountered a SCSI device that responded to the initial READ CAPACITY command
with a good status, but no data was transferred.  This caused a sudden change of
the device capacity to zero when the device was rescanned, for no obvious reason.

This patch series changes read_capacity_10() and read_capacity_16() in sd.c
to retry the command up to 3 times in an attempt to get valid capacity information.
A message is logged if this is ultimately unsuccessful.

There are some predecessor patches, one from a patch in a series by Mike Christie
which changes read_capacity_16() to use the scsi_failures mechanism (which did
not eventually get merged), this makes the changes here much more similar for
both the read_capacity_10 and read_capacity_16() case.  Another patch corrects
a potential use of an uninitialized variable, and a third one removes a check
for -EOVERFLOW that hasn't been needed since commit 72deb455b5ec
("block: remove CONFIG_LBDAF").

The final patch to scsi_debug is allow insertion of the fault to test this change.

Ewan D. Milne (4):
  scsi: sd: Avoid passing potentially uninitialized "sense_valid" to
    read_capacity_error()
  scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16)
    returning no data
  scsi: scsi_debug: Add option to suppress returned data but return good
    status

Mike Christie (1):
  scsi: sd: Have scsi-ml retry read_capacity_16 errors

 drivers/scsi/scsi_debug.c |  38 ++++++---
 drivers/scsi/sd.c         | 173 +++++++++++++++++++++++++++-----------
 2 files changed, 151 insertions(+), 60 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-07-16 18:48 [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
@ 2025-07-16 18:48 ` Ewan D. Milne
  2025-08-13  0:22   ` Bart Van Assche
  2025-08-13 17:50   ` Bart Van Assche
  2025-07-16 18:48 ` [PATCH 2/5] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Ewan D. Milne @ 2025-07-16 18:48 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert

From: Mike Christie <michael.christie@oracle.com>

This has read_capacity_16 have scsi-ml retry errors instead of driving
them itself.

There are 2 behavior changes with this patch:
1. There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs since the block layer waits/retries for us. For possible
memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
retrying will probably not help.
2. For the specific UAs we checked for and retried, we would get
READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
from the main loop's retries. Each UA now gets
READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
is already 10 and is not based on anything specific like a spec or
device, so the extra 3 we got from the main loop was probably just an
accident and is not going to help.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c | 108 ++++++++++++++++++++++++++++++----------------
 1 file changed, 71 insertions(+), 37 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3f6e87705b62..41b7dee5eaf1 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2626,55 +2626,89 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
 static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		struct queue_limits *lim, unsigned char *buffer)
 {
-	unsigned char cmd[16];
-	struct scsi_sense_hdr sshdr;
-	const struct scsi_exec_args exec_args = {
-		.sshdr = &sshdr,
+	static const u8 cmd[16] = {
+		[0] = SERVICE_ACTION_IN_16,
+		[1] = SAI_READ_CAPACITY_16,
+		[13] = RC16_LEN,
 	};
+	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	int the_result;
-	int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
 	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
+	struct scsi_failure failure_defs[] = {
+		/*
+		 * Do not retry Invalid Command Operation Code or Invalid
+		 * Field in CDB.
+		 */
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x20,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x24,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Do not retry Medium Not Present */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x3A,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = 0x3A,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Device reset might occur several times so retry a lot */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Any other error not listed above retry 3 times */
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+			.allowed = 3,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
+	const struct scsi_exec_args exec_args = {
+		.sshdr = &sshdr,
+		.failures = &failures,
+	};
 
 	if (sdp->no_read_capacity_16)
 		return -EINVAL;
 
-	do {
-		memset(cmd, 0, 16);
-		cmd[0] = SERVICE_ACTION_IN_16;
-		cmd[1] = SAI_READ_CAPACITY_16;
-		cmd[13] = RC16_LEN;
-		memset(buffer, 0, RC16_LEN);
-
-		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
-					      buffer, RC16_LEN, SD_TIMEOUT,
-					      sdkp->max_retries, &exec_args);
-		if (the_result > 0) {
-			if (media_not_present(sdkp, &sshdr))
-				return -ENODEV;
+	memset(buffer, 0, RC16_LEN);
 
-			sense_valid = scsi_sense_valid(&sshdr);
-			if (sense_valid &&
-			    sshdr.sense_key == ILLEGAL_REQUEST &&
-			    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
-			    sshdr.ascq == 0x00)
-				/* Invalid Command Operation Code or
-				 * Invalid Field in CDB, just retry
-				 * silently with RC10 */
-				return -EINVAL;
-			if (sense_valid &&
-			    sshdr.sense_key == UNIT_ATTENTION &&
-			    sshdr.asc == 0x29 && sshdr.ascq == 0x00)
-				/* Device reset might occur several times,
-				 * give it one more chance */
-				if (--reset_retries > 0)
-					continue;
-		}
-		retries--;
+	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
+				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
+				      &exec_args);
 
-	} while (the_result && retries);
+	if (the_result > 0) {
+		if (media_not_present(sdkp, &sshdr))
+			return -ENODEV;
+
+		sense_valid = scsi_sense_valid(&sshdr);
+		if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST &&
+		    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
+		     sshdr.ascq == 0x00) {
+			/*
+			 * Invalid Command Operation Code or Invalid Field in
+			 * CDB, just retry silently with RC10
+			 */
+			return -EINVAL;
+		}
+	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error()
  2025-07-16 18:48 [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
  2025-07-16 18:48 ` [PATCH 1/5] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
@ 2025-07-16 18:48 ` Ewan D. Milne
  2025-08-13  0:02   ` Bart Van Assche
  2025-07-16 18:48 ` [PATCH 3/5] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Ewan D. Milne @ 2025-07-16 18:48 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert

read_capacity_10() sets "sense_valid" in a different conditional statement prior to
calling read_capacity_error(), and does not use this value otherwise.  Move the call
to scsi_sense_valid() to read_capacity_error() instead of passing it as a parameter
from read_capacity_16() and read_capacity_10().

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 41b7dee5eaf1..72ad4b82f0f3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2592,9 +2592,10 @@ static void sd_config_protection(struct scsi_disk *sdkp,
 }
 
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
-			struct scsi_sense_hdr *sshdr, int sense_valid,
-			int the_result)
+				struct scsi_sense_hdr *sshdr, int the_result)
 {
+	bool sense_valid = scsi_sense_valid(sshdr);
+
 	if (sense_valid)
 		sd_print_sense_hdr(sdkp, sshdr);
 	else
@@ -2712,7 +2713,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);
-		read_capacity_error(sdkp, sdp, &sshdr, sense_valid, the_result);
+		read_capacity_error(sdkp, sdp, &sshdr, the_result);
 		return -EINVAL;
 	}
 
@@ -2786,7 +2787,6 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		.sshdr = &sshdr,
 		.failures = &failures,
 	};
-	int sense_valid = 0;
 	int the_result;
 	sector_t lba;
 	unsigned sector_size;
@@ -2798,15 +2798,13 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 				      &exec_args);
 
 	if (the_result > 0) {
-		sense_valid = scsi_sense_valid(&sshdr);
-
 		if (media_not_present(sdkp, &sshdr))
 			return -ENODEV;
 	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(10) failed", the_result);
-		read_capacity_error(sdkp, sdp, &sshdr, sense_valid, the_result);
+		read_capacity_error(sdkp, sdp, &sshdr, the_result);
 		return -EINVAL;
 	}
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  2025-07-16 18:48 [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
  2025-07-16 18:48 ` [PATCH 1/5] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
  2025-07-16 18:48 ` [PATCH 2/5] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
@ 2025-07-16 18:48 ` Ewan D. Milne
  2025-08-13  0:04   ` Bart Van Assche
  2025-07-16 18:48 ` [PATCH 4/5] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Ewan D. Milne @ 2025-07-16 18:48 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert

Remove checks for -EOVERFLOW in sd_read_capacity() because this value has not
been returned to it since commit 72deb455b5ec ("block: remove CONFIG_LBDAF").

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 72ad4b82f0f3..f8bc5f6a6511 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2850,8 +2850,6 @@ sd_read_capacity(struct scsi_disk *sdkp, struct queue_limits *lim,
 
 	if (sd_try_rc16_first(sdp)) {
 		sector_size = read_capacity_16(sdkp, sdp, lim, buffer);
-		if (sector_size == -EOVERFLOW)
-			goto got_data;
 		if (sector_size == -ENODEV)
 			return;
 		if (sector_size < 0)
@@ -2860,8 +2858,6 @@ sd_read_capacity(struct scsi_disk *sdkp, struct queue_limits *lim,
 			return;
 	} else {
 		sector_size = read_capacity_10(sdkp, sdp, buffer);
-		if (sector_size == -EOVERFLOW)
-			goto got_data;
 		if (sector_size < 0)
 			return;
 		if ((sizeof(sdkp->capacity) > 4) &&
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-07-16 18:48 [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (2 preceding siblings ...)
  2025-07-16 18:48 ` [PATCH 3/5] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
@ 2025-07-16 18:48 ` Ewan D. Milne
  2025-08-13  0:13   ` Bart Van Assche
  2025-07-16 18:48 ` [PATCH 5/5] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
  2025-08-01 20:51 ` [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan Milne
  5 siblings, 1 reply; 14+ messages in thread
From: Ewan D. Milne @ 2025-07-16 18:48 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert

sd_read_capacity_10() and sd_read_capacity_16() do not check for underflow
and can extract invalid (e.g. zero) data when a malfunctioning device does
not actually transfer any data, but returnes a good status otherwise.
Check for this and retry, and log a message and return -EINVAL if we can't
get the capacity information.

We encountered a device that did this once but returned good data afterwards.

See similar commit 5cd3bbfad088 ("[SCSI] retry with missing data for INQUIRY")

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/sd.c | 69 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 58 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f8bc5f6a6511..c8f2efe3b767 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2633,6 +2633,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		[13] = RC16_LEN,
 	};
 	struct scsi_sense_hdr sshdr;
+	int count, resid;
 	int sense_valid = 0;
 	int the_result;
 	unsigned int alignment;
@@ -2683,17 +2684,31 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.resid = &resid,
 		.failures = &failures,
 	};
 
 	if (sdp->no_read_capacity_16)
 		return -EINVAL;
 
-	memset(buffer, 0, RC16_LEN);
+	for (count = 0; count < 3; ++count) {
+		memset(buffer, 0, RC16_LEN);
 
-	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
-				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
-				      &exec_args);
+		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
+					      buffer, RC16_LEN, SD_TIMEOUT,
+					      sdkp->max_retries, &exec_args);
+
+		if (the_result == 0) {
+			/*
+			 * if nothing was transferred, we try
+			 * again. It's a workaround for a broken
+			 * device.
+			 */
+			if (resid == RC16_LEN)
+				continue;
+		}
+		break;
+	}
 
 	if (the_result > 0) {
 		if (media_not_present(sdkp, &sshdr))
@@ -2717,6 +2732,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		return -EINVAL;
 	}
 
+	if (resid == RC16_LEN) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Read Capacity(16) returned good status but no data");
+		return -EINVAL;
+	}
+
 	sector_size = get_unaligned_be32(&buffer[8]);
 	lba = get_unaligned_be64(&buffer[0]);
 
@@ -2749,11 +2770,20 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	return sector_size;
 }
 
+#define RC10_LEN 8
+#if RC10_LEN > SD_BUF_SIZE
+#error RC10_LEN must not be more than SD_BUF_SIZE
+#endif
+
 static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 						unsigned char *buffer)
 {
 	static const u8 cmd[10] = { READ_CAPACITY };
 	struct scsi_sense_hdr sshdr;
+	int count, resid;
+	int the_result;
+	sector_t lba;
+	unsigned sector_size;
 	struct scsi_failure failure_defs[] = {
 		/* Do not retry Medium Not Present */
 		{
@@ -2785,17 +2815,28 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
+		.resid = &resid,
 		.failures = &failures,
 	};
-	int the_result;
-	sector_t lba;
-	unsigned sector_size;
 
-	memset(buffer, 0, 8);
+	for (count = 0; count < 3; ++count) {
+		memset(buffer, 0, RC10_LEN);
 
-	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
-				      8, SD_TIMEOUT, sdkp->max_retries,
-				      &exec_args);
+		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
+					      buffer, RC10_LEN, SD_TIMEOUT,
+					      sdkp->max_retries, &exec_args);
+
+		if (the_result == 0) {
+			/*
+			 * if nothing was transferred, we try
+			 * again. It's a workaround for a broken
+			 * device.
+			 */
+			if (resid == RC10_LEN)
+				continue;
+		}
+		break;
+	}
 
 	if (the_result > 0) {
 		if (media_not_present(sdkp, &sshdr))
@@ -2808,6 +2849,12 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		return -EINVAL;
 	}
 
+	if (resid == RC10_LEN) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Read Capacity(10) returned good status but no data");
+		return -EINVAL;
+	}
+
 	sector_size = get_unaligned_be32(&buffer[4]);
 	lba = get_unaligned_be32(&buffer[0]);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] scsi: scsi_debug: Add option to suppress returned data but return good status
  2025-07-16 18:48 [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (3 preceding siblings ...)
  2025-07-16 18:48 ` [PATCH 4/5] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
@ 2025-07-16 18:48 ` Ewan D. Milne
  2025-08-13  0:16   ` Bart Van Assche
  2025-08-01 20:51 ` [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan Milne
  5 siblings, 1 reply; 14+ messages in thread
From: Ewan D. Milne @ 2025-07-16 18:48 UTC (permalink / raw)
  To: linux-scsi; +Cc: michael.christie, dgilbert

This is used to test the earlier read_capacity_10()/16() retry patch.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/scsi_debug.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index aef33d1e346a..a8ec653d4795 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -230,6 +230,7 @@ struct tape_block {
 #define SDEBUG_OPT_NO_CDB_NOISE		0x4000
 #define SDEBUG_OPT_HOST_BUSY		0x8000
 #define SDEBUG_OPT_CMD_ABORT		0x10000
+#define SDEBUG_OPT_NO_DATA		0x20000
 #define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \
 			      SDEBUG_OPT_RESET_NOISE)
 #define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \
@@ -1641,10 +1642,17 @@ static int fill_from_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr,
 	if (scp->sc_data_direction != DMA_FROM_DEVICE)
 		return DID_ERROR << 16;
 
-	act_len = sg_copy_from_buffer(sdb->table.sgl, sdb->table.nents,
-				      arr, arr_len);
-	scsi_set_resid(scp, scsi_bufflen(scp) - act_len);
-
+	/*
+	 * Conditionally suppress DATA IN transfer and leave resid set to bufflen.
+	 */
+	if (unlikely((sdebug_opts & SDEBUG_OPT_NO_DATA) &&
+		      atomic_read(&sdeb_inject_pending))) {
+		scsi_set_resid(scp, scsi_bufflen(scp));
+	} else {
+		act_len = sg_copy_from_buffer(sdb->table.sgl, sdb->table.nents,
+					      arr, arr_len);
+		scsi_set_resid(scp, scsi_bufflen(scp) - act_len);
+	}
 	return 0;
 }
 
@@ -1665,13 +1673,21 @@ static int p_fill_from_dev_buffer(struct scsi_cmnd *scp, const void *arr,
 	if (scp->sc_data_direction != DMA_FROM_DEVICE)
 		return DID_ERROR << 16;
 
-	act_len = sg_pcopy_from_buffer(sdb->table.sgl, sdb->table.nents,
-				       arr, arr_len, skip);
-	pr_debug("%s: off_dst=%u, scsi_bufflen=%u, act_len=%u, resid=%d\n",
-		 __func__, off_dst, scsi_bufflen(scp), act_len,
-		 scsi_get_resid(scp));
-	n = scsi_bufflen(scp) - (off_dst + act_len);
-	scsi_set_resid(scp, min_t(u32, scsi_get_resid(scp), n));
+	/*
+	 * Conditionally suppress DATA IN transfer and leave resid set to bufflen.
+	 */
+	if (unlikely((sdebug_opts & SDEBUG_OPT_NO_DATA) &&
+		      atomic_read(&sdeb_inject_pending))) {
+		scsi_set_resid(scp, scsi_bufflen(scp));
+	} else {
+		act_len = sg_pcopy_from_buffer(sdb->table.sgl, sdb->table.nents,
+					       arr, arr_len, skip);
+		pr_debug("%s: off_dst=%u, scsi_bufflen=%u, act_len=%u, resid=%d\n",
+			 __func__, off_dst, scsi_bufflen(scp), act_len,
+			 scsi_get_resid(scp));
+		n = scsi_bufflen(scp) - (off_dst + act_len);
+		scsi_set_resid(scp, min_t(u32, scsi_get_resid(scp), n));
+	}
 	return 0;
 }
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data
  2025-07-16 18:48 [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
                   ` (4 preceding siblings ...)
  2025-07-16 18:48 ` [PATCH 5/5] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
@ 2025-08-01 20:51 ` Ewan Milne
  2025-08-06  2:22   ` Martin K. Petersen
  5 siblings, 1 reply; 14+ messages in thread
From: Ewan Milne @ 2025-08-01 20:51 UTC (permalink / raw)
  To: linux-scsi

Gentle ping for reviews.

Thanks.

-Ewan

On Wed, Jul 16, 2025 at 2:48 PM Ewan D. Milne <emilne@redhat.com> wrote:
>
> We encountered a SCSI device that responded to the initial READ CAPACITY command
> with a good status, but no data was transferred.  This caused a sudden change of
> the device capacity to zero when the device was rescanned, for no obvious reason.
>
> This patch series changes read_capacity_10() and read_capacity_16() in sd.c
> to retry the command up to 3 times in an attempt to get valid capacity information.
> A message is logged if this is ultimately unsuccessful.
>
> There are some predecessor patches, one from a patch in a series by Mike Christie
> which changes read_capacity_16() to use the scsi_failures mechanism (which did
> not eventually get merged), this makes the changes here much more similar for
> both the read_capacity_10 and read_capacity_16() case.  Another patch corrects
> a potential use of an uninitialized variable, and a third one removes a check
> for -EOVERFLOW that hasn't been needed since commit 72deb455b5ec
> ("block: remove CONFIG_LBDAF").
>
> The final patch to scsi_debug is allow insertion of the fault to test this change.
>
> Ewan D. Milne (4):
>   scsi: sd: Avoid passing potentially uninitialized "sense_valid" to
>     read_capacity_error()
>   scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
>   scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16)
>     returning no data
>   scsi: scsi_debug: Add option to suppress returned data but return good
>     status
>
> Mike Christie (1):
>   scsi: sd: Have scsi-ml retry read_capacity_16 errors
>
>  drivers/scsi/scsi_debug.c |  38 ++++++---
>  drivers/scsi/sd.c         | 173 +++++++++++++++++++++++++++-----------
>  2 files changed, 151 insertions(+), 60 deletions(-)
>
> --
> 2.47.1
>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data
  2025-08-01 20:51 ` [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan Milne
@ 2025-08-06  2:22   ` Martin K. Petersen
  0 siblings, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2025-08-06  2:22 UTC (permalink / raw)
  To: Ewan Milne; +Cc: linux-scsi


Ewan,

> Gentle ping for reviews.

I will take another look once the merge window has closed. But no
objections to the general approach.

Since this constitutes a core change, I would like another reviewer or
two...

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error()
  2025-07-16 18:48 ` [PATCH 2/5] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
@ 2025-08-13  0:02   ` Bart Van Assche
  0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2025-08-13  0:02 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert

On 7/16/25 11:48 AM, Ewan D. Milne wrote:
> @@ -2712,7 +2713,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>   
>   	if (the_result) {
>   		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);
> -		read_capacity_error(sdkp, sdp, &sshdr, sense_valid, the_result);
> +		read_capacity_error(sdkp, sdp, &sshdr, the_result);
>   		return -EINVAL;
>   	}
>   

Readability of read_capacity_16() would improve if the 'sense_valid'
declaration would be moved into the only scope where it is used.
Otherwise this patch looks good to me.

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity()
  2025-07-16 18:48 ` [PATCH 3/5] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
@ 2025-08-13  0:04   ` Bart Van Assche
  0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2025-08-13  0:04 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert

On 7/16/25 11:48 AM, Ewan D. Milne wrote:
> Remove checks for -EOVERFLOW in sd_read_capacity() because this value has not
> been returned to it since commit 72deb455b5ec ("block: remove CONFIG_LBDAF").
Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data
  2025-07-16 18:48 ` [PATCH 4/5] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
@ 2025-08-13  0:13   ` Bart Van Assche
  0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2025-08-13  0:13 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert

On 7/16/25 11:48 AM, Ewan D. Milne wrote:
> sd_read_capacity_10() and sd_read_capacity_16() do not check for underflow
> and can extract invalid (e.g. zero) data when a malfunctioning device does
> not actually transfer any data, but returnes a good status otherwise.
                                       returns?

> +		if (the_result == 0) {
> +			/*
> +			 * if nothing was transferred, we try
> +			 * again. It's a workaround for a broken
> +			 * device.
> +			 */
> +			if (resid == RC16_LEN)
> +				continue;
> +		}

Please combine the above two if-statements into a single if-statement.

> +#define RC10_LEN 8
> +#if RC10_LEN > SD_BUF_SIZE
> +#error RC10_LEN must not be more than SD_BUF_SIZE
> +#endif

Please pass the 'buffer' length as an argument to read_capacity_10().
That will make it easier to spot an inconsistent buffer size value by
auditing the caller of this function (I am aware that the above code
follows the style of the read_capacity_16() function).

> +		if (the_result == 0) {
> +			/*
> +			 * if nothing was transferred, we try
> +			 * again. It's a workaround for a broken
> +			 * device.
> +			 */
> +			if (resid == RC10_LEN)
> +				continue;
> +		}

Also here, please combine the two if-statements into a single if-statement.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] scsi: scsi_debug: Add option to suppress returned data but return good status
  2025-07-16 18:48 ` [PATCH 5/5] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
@ 2025-08-13  0:16   ` Bart Van Assche
  0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2025-08-13  0:16 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert


On 7/16/25 11:48 AM, Ewan D. Milne wrote:
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index aef33d1e346a..a8ec653d4795 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -230,6 +230,7 @@ struct tape_block {
>   #define SDEBUG_OPT_NO_CDB_NOISE		0x4000
>   #define SDEBUG_OPT_HOST_BUSY		0x8000
>   #define SDEBUG_OPT_CMD_ABORT		0x10000
> +#define SDEBUG_OPT_NO_DATA		0x20000
>   #define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \
>   			      SDEBUG_OPT_RESET_NOISE)
>   #define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \
> @@ -1641,10 +1642,17 @@ static int fill_from_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr,
>   	if (scp->sc_data_direction != DMA_FROM_DEVICE)
>   		return DID_ERROR << 16;
>   
> -	act_len = sg_copy_from_buffer(sdb->table.sgl, sdb->table.nents,
> -				      arr, arr_len);
> -	scsi_set_resid(scp, scsi_bufflen(scp) - act_len);
> -
> +	/*
> +	 * Conditionally suppress DATA IN transfer and leave resid set to bufflen.
> +	 */
> +	if (unlikely((sdebug_opts & SDEBUG_OPT_NO_DATA) &&
> +		      atomic_read(&sdeb_inject_pending))) {
> +		scsi_set_resid(scp, scsi_bufflen(scp));
> +	} else {
> +		act_len = sg_copy_from_buffer(sdb->table.sgl, sdb->table.nents,
> +					      arr, arr_len);
> +		scsi_set_resid(scp, scsi_bufflen(scp) - act_len);
> +	}
>   	return 0;
>   }
>   
> @@ -1665,13 +1673,21 @@ static int p_fill_from_dev_buffer(struct scsi_cmnd *scp, const void *arr,
>   	if (scp->sc_data_direction != DMA_FROM_DEVICE)
>   		return DID_ERROR << 16;
>   
> -	act_len = sg_pcopy_from_buffer(sdb->table.sgl, sdb->table.nents,
> -				       arr, arr_len, skip);
> -	pr_debug("%s: off_dst=%u, scsi_bufflen=%u, act_len=%u, resid=%d\n",
> -		 __func__, off_dst, scsi_bufflen(scp), act_len,
> -		 scsi_get_resid(scp));
> -	n = scsi_bufflen(scp) - (off_dst + act_len);
> -	scsi_set_resid(scp, min_t(u32, scsi_get_resid(scp), n));
> +	/*
> +	 * Conditionally suppress DATA IN transfer and leave resid set to bufflen.
> +	 */
> +	if (unlikely((sdebug_opts & SDEBUG_OPT_NO_DATA) &&
> +		      atomic_read(&sdeb_inject_pending))) {
> +		scsi_set_resid(scp, scsi_bufflen(scp));
> +	} else {
> +		act_len = sg_pcopy_from_buffer(sdb->table.sgl, sdb->table.nents,
> +					       arr, arr_len, skip);
> +		pr_debug("%s: off_dst=%u, scsi_bufflen=%u, act_len=%u, resid=%d\n",
> +			 __func__, off_dst, scsi_bufflen(scp), act_len,
> +			 scsi_get_resid(scp));
> +		n = scsi_bufflen(scp) - (off_dst + act_len);
> +		scsi_set_resid(scp, min_t(u32, scsi_get_resid(scp), n));
> +	}
>   	return 0;
>   }

Here and above, please combine the two scsi_set_resid() calls into a
single scsi_set_resid() call, e.g. by setting act_len to zero when the
DATA IN transfer is suppressed.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-07-16 18:48 ` [PATCH 1/5] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
@ 2025-08-13  0:22   ` Bart Van Assche
  2025-08-13 17:50   ` Bart Van Assche
  1 sibling, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2025-08-13  0:22 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert

On 7/16/25 11:48 AM, Ewan D. Milne wrote:
> This has read_capacity_16 have scsi-ml retry errors instead of driving
> them itself.
> 
> There are 2 behavior changes with this patch:
> 1. There is one behavior change where we no longer retry when
> scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
> for failures like the queue being removed, and for the case where there
> are no tags/reqs since the block layer waits/retries for us. For possible
> memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
> retrying will probably not help.
> 2. For the specific UAs we checked for and retried, we would get
> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
> from the main loop's retries. Each UA now gets
> READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
> retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
> is already 10 and is not based on anything specific like a spec or
> device, so the extra 3 we got from the main loop was probably just an
> accident and is not going to help.

Is this the same patch as 
https://lore.kernel.org/linux-scsi/89938a66-7203-4809-8d45-c820b5a21d4c@oracle.com/? 
If so, does
John Garry's Reviewed-by still apply?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] scsi: sd: Have scsi-ml retry read_capacity_16 errors
  2025-07-16 18:48 ` [PATCH 1/5] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
  2025-08-13  0:22   ` Bart Van Assche
@ 2025-08-13 17:50   ` Bart Van Assche
  1 sibling, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2025-08-13 17:50 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi; +Cc: michael.christie, dgilbert

On 7/16/25 11:48 AM, Ewan D. Milne wrote:
> +		/*
> +		 * Do not retry Invalid Command Operation Code or Invalid
> +		 * Field in CDB.
> +		 */
> +		{
> +			.sense = ILLEGAL_REQUEST,
> +			.asc = 0x20,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		{
> +			.sense = ILLEGAL_REQUEST,
> +			.asc = 0x24,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		/* Do not retry Medium Not Present */
> +		{
> +			.sense = UNIT_ATTENTION,
> +			.asc = 0x3A,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		{
> +			.sense = NOT_READY,
> +			.asc = 0x3A,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		/* Device reset might occur several times so retry a lot */
> +		{
> +			.sense = UNIT_ATTENTION,
> +			.asc = 0x29,
> +			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},

For the first, second and fifth array elements above: leaving out .ascq
is the same as requiring that the ASCQ value is zero. I prefer to make
this explicit by adding ".ascq = 0," in these array elements.

Additionally, media_not_present() doesn't check the ASCQ value while
the above array only accepts ASCQ == 0 if ASC == 0x3a. Please either
mention this behavior change in the patch description or add the
following in the third and fourth array elements:
".ascq = SCMD_FAILURE_ASCQ_ANY,".

> +	memset(buffer, 0, RC16_LEN);

Isn't the preferred style "memset(buffer, 0, ARRAY_SIZE(buffer))"? This
makes it easier for readers to verify that the third argument is
correct.

Otherwise this patch looks good to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-08-13 17:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-16 18:48 [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan D. Milne
2025-07-16 18:48 ` [PATCH 1/5] scsi: sd: Have scsi-ml retry read_capacity_16 errors Ewan D. Milne
2025-08-13  0:22   ` Bart Van Assche
2025-08-13 17:50   ` Bart Van Assche
2025-07-16 18:48 ` [PATCH 2/5] scsi: sd: Avoid passing potentially uninitialized "sense_valid" to read_capacity_error() Ewan D. Milne
2025-08-13  0:02   ` Bart Van Assche
2025-07-16 18:48 ` [PATCH 3/5] scsi: sd: Remove checks for -EOVERFLOW in sd_read_capacity() Ewan D. Milne
2025-08-13  0:04   ` Bart Van Assche
2025-07-16 18:48 ` [PATCH 4/5] scsi: sd: Check for and retry in case of READ_CAPCITY(10)/(16) returning no data Ewan D. Milne
2025-08-13  0:13   ` Bart Van Assche
2025-07-16 18:48 ` [PATCH 5/5] scsi: scsi_debug: Add option to suppress returned data but return good status Ewan D. Milne
2025-08-13  0:16   ` Bart Van Assche
2025-08-01 20:51 ` [PATCH 0/5] Retry READ CAPACITY(10)/(16) with good status but no data Ewan Milne
2025-08-06  2:22   ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).