public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Broken Domain Validation in 6.1.84+
       [not found]           ` <03ef7afd-98f5-4f1b-8330-329f47139ddf@bell.net>
@ 2024-04-06 18:51             ` James Bottomley
  2024-04-07  6:16               ` Greg KH
       [not found]             ` <yq1wmp9pb0d.fsf@ca-mkp.ca.oracle.com>
  1 sibling, 1 reply; 5+ messages in thread
From: James Bottomley @ 2024-04-06 18:51 UTC (permalink / raw)
  To: John David Anglin, Bart Van Assche, linux-parisc; +Cc: linux-scsi, stable

[cc stable to see if they have any ideas about fixing this]
On Sat, 2024-04-06 at 12:16 -0400, John David Anglin wrote:
> On 2024-04-06 11:06 a.m., James Bottomley wrote:
> > On Sat, 2024-04-06 at 10:30 -0400, John David Anglin wrote:
> > > On 2024-04-05 3:36 p.m., Bart Van Assche wrote:
> > > > On 4/4/24 13:07, John David Anglin wrote:
> > > > > On 2024-04-04 12:32 p.m., Bart Van Assche wrote:
> > > > > > Can you please help with verifying whether this kernel warn
> > > > > > ing is only triggered by the 6.1 stable kernel series or
> > > > > > whether it is also
> > > > > > triggered by a vanilla kernel, e.g. kernel v6.8? That will 
> > > > > > tell us whether we 
> > > > > > need to review the upstream changes or the backp
> > > > > > orts on the v6.1 branch.
> > > > > Stable kernel v6.8.3 is okay.
> > > > Would it be possible to bisect this issue on the linux-6.1.y
> > > > branch? That probably will be faster than reviewing all
> > > > backports
> > > > of SCSI patches on that branch.
> > > The warning triggers with v6.1.81.  It doesn't trigger with
> > > v6.1.80.
> > It's this patch:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=cf33e6ca12d814e1be2263cb76960d0019d7fb94
> > 
> > The specific problem being that the update to scsi_execute doesn't
> > set the sense_len that the WARN_ON is checking.
> > 
> > This isn't a problem in mainline because we've converted all uses
> > of scsi_execute.  Stable needs to either complete the conversion or
> > back out the inital patch. This change depends on the above change:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9
> 
> Thus, more than just the initial patch needs to be backed out.

OK, so the reason the bad patch got pulled in is because it's a
precursor of this fixes tagged backport:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9

Which is presumably the other patch you had to back out to fix the
issue.

The problem is that Mike's series updating and then removing
scsi_execute() went into the tree as one series, so no-one notice the
first patch had this bug because the buggy routine got removed at the
end of the series.  This also means there's nothing to fix and backport
in upstream.

The bug is also more widely spread than simply domain validation,
because every use of scsi_execute in the current stable tree will trip
this.

I'm not sure what the best fix is.  I can certainly come up with a one
line fix for stable adding the missing length in the #define, but it
can't come from upstream as stated above.  We could back the two
patches out then do a stable specific fix for the UAS problem (I don't
think we can leave the UAS patch backed out because the problem was
pretty serious).

What does stable want to do?

James


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken Domain Validation in 6.1.84+
  2024-04-06 18:51             ` Broken Domain Validation in 6.1.84+ James Bottomley
@ 2024-04-07  6:16               ` Greg KH
  0 siblings, 0 replies; 5+ messages in thread
From: Greg KH @ 2024-04-07  6:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: John David Anglin, Bart Van Assche, linux-parisc, linux-scsi,
	stable

On Sat, Apr 06, 2024 at 02:51:36PM -0400, James Bottomley wrote:
> [cc stable to see if they have any ideas about fixing this]
> On Sat, 2024-04-06 at 12:16 -0400, John David Anglin wrote:
> > On 2024-04-06 11:06 a.m., James Bottomley wrote:
> > > On Sat, 2024-04-06 at 10:30 -0400, John David Anglin wrote:
> > > > On 2024-04-05 3:36 p.m., Bart Van Assche wrote:
> > > > > On 4/4/24 13:07, John David Anglin wrote:
> > > > > > On 2024-04-04 12:32 p.m., Bart Van Assche wrote:
> > > > > > > Can you please help with verifying whether this kernel warn
> > > > > > > ing is only triggered by the 6.1 stable kernel series or
> > > > > > > whether it is also
> > > > > > > triggered by a vanilla kernel, e.g. kernel v6.8? That will 
> > > > > > > tell us whether we 
> > > > > > > need to review the upstream changes or the backp
> > > > > > > orts on the v6.1 branch.
> > > > > > Stable kernel v6.8.3 is okay.
> > > > > Would it be possible to bisect this issue on the linux-6.1.y
> > > > > branch? That probably will be faster than reviewing all
> > > > > backports
> > > > > of SCSI patches on that branch.
> > > > The warning triggers with v6.1.81.  It doesn't trigger with
> > > > v6.1.80.
> > > It's this patch:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=cf33e6ca12d814e1be2263cb76960d0019d7fb94
> > > 
> > > The specific problem being that the update to scsi_execute doesn't
> > > set the sense_len that the WARN_ON is checking.
> > > 
> > > This isn't a problem in mainline because we've converted all uses
> > > of scsi_execute.  Stable needs to either complete the conversion or
> > > back out the inital patch. This change depends on the above change:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9
> > 
> > Thus, more than just the initial patch needs to be backed out.
> 
> OK, so the reason the bad patch got pulled in is because it's a
> precursor of this fixes tagged backport:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9
> 
> Which is presumably the other patch you had to back out to fix the
> issue.
> 
> The problem is that Mike's series updating and then removing
> scsi_execute() went into the tree as one series, so no-one notice the
> first patch had this bug because the buggy routine got removed at the
> end of the series.  This also means there's nothing to fix and backport
> in upstream.
> 
> The bug is also more widely spread than simply domain validation,
> because every use of scsi_execute in the current stable tree will trip
> this.
> 
> I'm not sure what the best fix is.  I can certainly come up with a one
> line fix for stable adding the missing length in the #define, but it
> can't come from upstream as stated above.  We could back the two
> patches out then do a stable specific fix for the UAS problem (I don't
> think we can leave the UAS patch backed out because the problem was
> pretty serious).
> 
> What does stable want to do?

We want to do whatever is in Linus's tree if at all possible.  Or revert
anything we applied that we shouldn't have.  Either one is fine with us,
just let us know what to do here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken Domain Validation in 6.1.84+
       [not found]               ` <b3df77f6-2928-46cd-a7ee-f806d4c937d1@bell.net>
@ 2024-04-08 17:19                 ` Martin K. Petersen
  2024-04-11  4:40                   ` Cyril Brulebois
  2024-04-11  7:29                   ` Greg KH
  0 siblings, 2 replies; 5+ messages in thread
From: Martin K. Petersen @ 2024-04-08 17:19 UTC (permalink / raw)
  To: John David Anglin
  Cc: Martin K. Petersen, James Bottomley, Bart Van Assche,
	linux-parisc, linux-scsi, Greg KH, stable


Dave,

>> Could you please try the patch below on top of v6.1.80?
> Works okay on top of v6.1.80:
>
> [   30.952668] scsi 6:0:0:0: Direct-Access     HP 73.4G ST373207LW       HPC1 PQ: 0 ANSI: 3
> [   31.072592] scsi target6:0:0: Beginning Domain Validation
> [   31.139334] scsi 6:0:0:0: Power-on or device reset occurred
> [   31.186227] scsi target6:0:0: Ending Domain Validation
> [   31.240482] scsi target6:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63)
> [   31.462587] ata5: SATA link down (SStatus 0 SControl 0)
> [   31.618798] scsi 6:0:2:0: Direct-Access     HP 73.4G ST373207LW       HPC1 PQ: 0 ANSI: 3
> [   31.732588] scsi target6:0:2: Beginning Domain Validation
> [   31.799201] scsi 6:0:2:0: Power-on or device reset occurred
> [   31.846724] scsi target6:0:2: Ending Domain Validation
> [   31.900822] scsi target6:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63)

Great, thanks for testing!

Greg, please revert the following commits from linux-6.1.y:

b73dd5f99972 ("scsi: sd: usb_storage: uas: Access media prior to querying device properties")
cf33e6ca12d8 ("scsi: core: Add struct for args to execution functions")

and include the patch below instead.

Thank you!

-- 
Martin K. Petersen	Oracle Linux Engineering

From 87441914d491c01b73b949663c101056a9d9b8c7 Mon Sep 17 00:00:00 2001
From: "Martin K. Petersen" <martin.petersen@oracle.com>
Date: Tue, 13 Feb 2024 09:33:06 -0500
Subject: [PATCH] scsi: sd: usb_storage: uas: Access media prior to querying
 device properties

[ Upstream commit 321da3dc1f3c92a12e3c5da934090d2992a8814c ]

It has been observed that some USB/UAS devices return generic properties
hardcoded in firmware for mode pages for a period of time after a device
has been discovered. The reported properties are either garbage or they do
not accurately reflect the characteristics of the physical storage device
attached in the case of a bridge.

Prior to commit 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to
avoid calling revalidate twice") we would call revalidate several
times during device discovery. As a result, incorrect values would
eventually get replaced with ones accurately describing the attached
storage. When we did away with the redundant revalidate pass, several
cases were reported where devices reported nonsensical values or would
end up in write-protected state.

An initial attempt at addressing this issue involved introducing a
delayed second revalidate invocation. However, this approach still
left some devices reporting incorrect characteristics.

Tasos Sahanidis debugged the problem further and identified that
introducing a READ operation prior to MODE SENSE fixed the problem and that
it wasn't a timing issue. Issuing a READ appears to cause the devices to
update their state to reflect the actual properties of the storage
media. Device properties like vendor, model, and storage capacity appear to
be correctly reported from the get-go. It is unclear why these devices
defer populating the remaining characteristics.

Match the behavior of a well known commercial operating system and
trigger a READ operation prior to querying device characteristics to
force the device to populate the mode pages.

The additional READ is triggered by a flag set in the USB storage and
UAS drivers. We avoid issuing the READ for other transport classes
since some storage devices identify Linux through our particular
discovery command sequence.

Link: https://lore.kernel.org/r/20240213143306.2194237-1-martin.petersen@oracle.com
Fixes: 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to avoid calling revalidate twice")
Cc: stable@vger.kernel.org
Reported-by: Tasos Sahanidis <tasos@tasossah.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Tasos Sahanidis <tasos@tasossah.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 31b5273f43a7..349b1455a2c6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3284,6 +3284,24 @@ static bool sd_validate_opt_xfer_size(struct scsi_disk *sdkp,
 	return true;
 }
 
+static void sd_read_block_zero(struct scsi_disk *sdkp)
+{
+	unsigned int buf_len = sdkp->device->sector_size;
+	char *buffer, cmd[10] = { };
+
+	buffer = kmalloc(buf_len, GFP_KERNEL);
+	if (!buffer)
+		return;
+
+	cmd[0] = READ_10;
+	put_unaligned_be32(0, &cmd[2]); /* Logical block address 0 */
+	put_unaligned_be16(1, &cmd[7]);	/* Transfer 1 logical block */
+
+	scsi_execute_req(sdkp->device, cmd, DMA_FROM_DEVICE, buffer, buf_len,
+			 NULL, SD_TIMEOUT, sdkp->max_retries, NULL);
+	kfree(buffer);
+}
+
 /**
  *	sd_revalidate_disk - called the first time a new disk is seen,
  *	performs disk spin up, read_capacity, etc.
@@ -3323,7 +3341,13 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	 */
 	if (sdkp->media_present) {
 		sd_read_capacity(sdkp, buffer);
-
+		/*
+		 * Some USB/UAS devices return generic values for mode pages
+		 * until the media has been accessed. Trigger a READ operation
+		 * to force the device to populate mode pages.
+		 */
+		if (sdp->read_before_ms)
+			sd_read_block_zero(sdkp);
 		/*
 		 * set the default to rotational.  All non-rotational devices
 		 * support the block characteristics VPD page, which will
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index c54e9805da53..12cf9940e5b6 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -179,6 +179,13 @@ static int slave_configure(struct scsi_device *sdev)
 		 */
 		sdev->use_192_bytes_for_3f = 1;
 
+		/*
+		 * Some devices report generic values until the media has been
+		 * accessed. Force a READ(10) prior to querying device
+		 * characteristics.
+		 */
+		sdev->read_before_ms = 1;
+
 		/*
 		 * Some devices don't like MODE SENSE with page=0x3f,
 		 * which is the command used for checking if a device
diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index de3836412bf3..ed22053b3252 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -878,6 +878,13 @@ static int uas_slave_configure(struct scsi_device *sdev)
 	if (devinfo->flags & US_FL_CAPACITY_HEURISTICS)
 		sdev->guess_capacity = 1;
 
+	/*
+	 * Some devices report generic values until the media has been
+	 * accessed. Force a READ(10) prior to querying device
+	 * characteristics.
+	 */
+	sdev->read_before_ms = 1;
+
 	/*
 	 * Some devices don't like MODE SENSE with page=0x3f,
 	 * which is the command used for checking if a device
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index d2751ed536df..1504d3137cc6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -204,6 +204,7 @@ struct scsi_device {
 	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
 	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
 	unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
+	unsigned read_before_ms:1;	/* perform a READ before MODE SENSE */
 	unsigned no_report_opcodes:1;	/* no REPORT SUPPORTED OPERATION CODES */
 	unsigned no_write_same:1;	/* no WRITE SAME command */
 	unsigned use_16_for_rw:1; /* Use read/write(16) over read/write(10) */

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Broken Domain Validation in 6.1.84+
  2024-04-08 17:19                 ` Martin K. Petersen
@ 2024-04-11  4:40                   ` Cyril Brulebois
  2024-04-11  7:29                   ` Greg KH
  1 sibling, 0 replies; 5+ messages in thread
From: Cyril Brulebois @ 2024-04-11  4:40 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: John David Anglin, James Bottomley, Bart Van Assche, linux-parisc,
	linux-scsi, Greg KH, stable

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

Hi,

Martin K. Petersen <martin.petersen@oracle.com> (2024-04-08):
> Great, thanks for testing!
> 
> Greg, please revert the following commits from linux-6.1.y:
> 
> b73dd5f99972 ("scsi: sd: usb_storage: uas: Access media prior to querying device properties")
> cf33e6ca12d8 ("scsi: core: Add struct for args to execution functions")
> 
> and include the patch below instead.

According to James I've ran into another expression of the same issue,
which in my case led to the loss of some SMART information:
  https://lore.kernel.org/stable/20240410193207.qnb75osxuk4ovvm6@mraw.org/
  https://lore.kernel.org/stable/0655cea93e52928d3e4a12b4fe2d2a4375492ed3.camel@linux.ibm.com/

I've just confirmed that both reverts plus that patch, applied on top of
v6.1.85, fix that regression for me. That's tested in a QEMU VM with a
SATA disk (that exposes only a few SMART attributes anyway) and also on
baremetal with real disks (2 pairs of Seagate IronWolf): smartctl
returns data again, including temperatures.

Closes: https://bugs.debian.org/1068675
Tested-by: Cyril Brulebois <kibi@debian.org>


Cheers,
-- 
Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken Domain Validation in 6.1.84+
  2024-04-08 17:19                 ` Martin K. Petersen
  2024-04-11  4:40                   ` Cyril Brulebois
@ 2024-04-11  7:29                   ` Greg KH
  1 sibling, 0 replies; 5+ messages in thread
From: Greg KH @ 2024-04-11  7:29 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: John David Anglin, James Bottomley, Bart Van Assche, linux-parisc,
	linux-scsi, stable

On Mon, Apr 08, 2024 at 01:19:50PM -0400, Martin K. Petersen wrote:
> 
> Dave,
> 
> >> Could you please try the patch below on top of v6.1.80?
> > Works okay on top of v6.1.80:
> >
> > [   30.952668] scsi 6:0:0:0: Direct-Access     HP 73.4G ST373207LW       HPC1 PQ: 0 ANSI: 3
> > [   31.072592] scsi target6:0:0: Beginning Domain Validation
> > [   31.139334] scsi 6:0:0:0: Power-on or device reset occurred
> > [   31.186227] scsi target6:0:0: Ending Domain Validation
> > [   31.240482] scsi target6:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63)
> > [   31.462587] ata5: SATA link down (SStatus 0 SControl 0)
> > [   31.618798] scsi 6:0:2:0: Direct-Access     HP 73.4G ST373207LW       HPC1 PQ: 0 ANSI: 3
> > [   31.732588] scsi target6:0:2: Beginning Domain Validation
> > [   31.799201] scsi 6:0:2:0: Power-on or device reset occurred
> > [   31.846724] scsi target6:0:2: Ending Domain Validation
> > [   31.900822] scsi target6:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63)
> 
> Great, thanks for testing!
> 
> Greg, please revert the following commits from linux-6.1.y:
> 
> b73dd5f99972 ("scsi: sd: usb_storage: uas: Access media prior to querying device properties")
> cf33e6ca12d8 ("scsi: core: Add struct for args to execution functions")
> 
> and include the patch below instead.

Now done, thanks!

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-11  7:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <b0670b6f-b7f7-4212-9802-7773dcd7206e@bell.net>
     [not found] ` <d1fc0b8d-4858-4234-8b66-c8980f612ea2@acm.org>
     [not found]   ` <db784080-2268-4e6d-84bd-b33055a3331b@bell.net>
     [not found]     ` <028352c6-7e34-4267-bbff-10c93d3596d3@acm.org>
     [not found]       ` <cf78b204-9149-4462-8e82-b8f98859004b@bell.net>
     [not found]         ` <6cb06622e6add6309e8dbb9a8944d53d1b9c4aaa.camel@HansenPartnership.com>
     [not found]           ` <03ef7afd-98f5-4f1b-8330-329f47139ddf@bell.net>
2024-04-06 18:51             ` Broken Domain Validation in 6.1.84+ James Bottomley
2024-04-07  6:16               ` Greg KH
     [not found]             ` <yq1wmp9pb0d.fsf@ca-mkp.ca.oracle.com>
     [not found]               ` <b3df77f6-2928-46cd-a7ee-f806d4c937d1@bell.net>
2024-04-08 17:19                 ` Martin K. Petersen
2024-04-11  4:40                   ` Cyril Brulebois
2024-04-11  7:29                   ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox