* Re: Broken Domain Validation in 6.1.84+ [not found] ` <03ef7afd-98f5-4f1b-8330-329f47139ddf@bell.net> @ 2024-04-06 18:51 ` James Bottomley 2024-04-07 6:16 ` Greg KH [not found] ` <yq1wmp9pb0d.fsf@ca-mkp.ca.oracle.com> 1 sibling, 1 reply; 5+ messages in thread From: James Bottomley @ 2024-04-06 18:51 UTC (permalink / raw) To: John David Anglin, Bart Van Assche, linux-parisc; +Cc: linux-scsi, stable [cc stable to see if they have any ideas about fixing this] On Sat, 2024-04-06 at 12:16 -0400, John David Anglin wrote: > On 2024-04-06 11:06 a.m., James Bottomley wrote: > > On Sat, 2024-04-06 at 10:30 -0400, John David Anglin wrote: > > > On 2024-04-05 3:36 p.m., Bart Van Assche wrote: > > > > On 4/4/24 13:07, John David Anglin wrote: > > > > > On 2024-04-04 12:32 p.m., Bart Van Assche wrote: > > > > > > Can you please help with verifying whether this kernel warn > > > > > > ing is only triggered by the 6.1 stable kernel series or > > > > > > whether it is also > > > > > > triggered by a vanilla kernel, e.g. kernel v6.8? That will > > > > > > tell us whether we > > > > > > need to review the upstream changes or the backp > > > > > > orts on the v6.1 branch. > > > > > Stable kernel v6.8.3 is okay. > > > > Would it be possible to bisect this issue on the linux-6.1.y > > > > branch? That probably will be faster than reviewing all > > > > backports > > > > of SCSI patches on that branch. > > > The warning triggers with v6.1.81. It doesn't trigger with > > > v6.1.80. > > It's this patch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=cf33e6ca12d814e1be2263cb76960d0019d7fb94 > > > > The specific problem being that the update to scsi_execute doesn't > > set the sense_len that the WARN_ON is checking. > > > > This isn't a problem in mainline because we've converted all uses > > of scsi_execute. Stable needs to either complete the conversion or > > back out the inital patch. This change depends on the above change: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9 > > Thus, more than just the initial patch needs to be backed out. OK, so the reason the bad patch got pulled in is because it's a precursor of this fixes tagged backport: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9 Which is presumably the other patch you had to back out to fix the issue. The problem is that Mike's series updating and then removing scsi_execute() went into the tree as one series, so no-one notice the first patch had this bug because the buggy routine got removed at the end of the series. This also means there's nothing to fix and backport in upstream. The bug is also more widely spread than simply domain validation, because every use of scsi_execute in the current stable tree will trip this. I'm not sure what the best fix is. I can certainly come up with a one line fix for stable adding the missing length in the #define, but it can't come from upstream as stated above. We could back the two patches out then do a stable specific fix for the UAS problem (I don't think we can leave the UAS patch backed out because the problem was pretty serious). What does stable want to do? James ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broken Domain Validation in 6.1.84+ 2024-04-06 18:51 ` Broken Domain Validation in 6.1.84+ James Bottomley @ 2024-04-07 6:16 ` Greg KH 0 siblings, 0 replies; 5+ messages in thread From: Greg KH @ 2024-04-07 6:16 UTC (permalink / raw) To: James Bottomley Cc: John David Anglin, Bart Van Assche, linux-parisc, linux-scsi, stable On Sat, Apr 06, 2024 at 02:51:36PM -0400, James Bottomley wrote: > [cc stable to see if they have any ideas about fixing this] > On Sat, 2024-04-06 at 12:16 -0400, John David Anglin wrote: > > On 2024-04-06 11:06 a.m., James Bottomley wrote: > > > On Sat, 2024-04-06 at 10:30 -0400, John David Anglin wrote: > > > > On 2024-04-05 3:36 p.m., Bart Van Assche wrote: > > > > > On 4/4/24 13:07, John David Anglin wrote: > > > > > > On 2024-04-04 12:32 p.m., Bart Van Assche wrote: > > > > > > > Can you please help with verifying whether this kernel warn > > > > > > > ing is only triggered by the 6.1 stable kernel series or > > > > > > > whether it is also > > > > > > > triggered by a vanilla kernel, e.g. kernel v6.8? That will > > > > > > > tell us whether we > > > > > > > need to review the upstream changes or the backp > > > > > > > orts on the v6.1 branch. > > > > > > Stable kernel v6.8.3 is okay. > > > > > Would it be possible to bisect this issue on the linux-6.1.y > > > > > branch? That probably will be faster than reviewing all > > > > > backports > > > > > of SCSI patches on that branch. > > > > The warning triggers with v6.1.81. It doesn't trigger with > > > > v6.1.80. > > > It's this patch: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=cf33e6ca12d814e1be2263cb76960d0019d7fb94 > > > > > > The specific problem being that the update to scsi_execute doesn't > > > set the sense_len that the WARN_ON is checking. > > > > > > This isn't a problem in mainline because we've converted all uses > > > of scsi_execute. Stable needs to either complete the conversion or > > > back out the inital patch. This change depends on the above change: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9 > > > > Thus, more than just the initial patch needs to be backed out. > > OK, so the reason the bad patch got pulled in is because it's a > precursor of this fixes tagged backport: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9 > > Which is presumably the other patch you had to back out to fix the > issue. > > The problem is that Mike's series updating and then removing > scsi_execute() went into the tree as one series, so no-one notice the > first patch had this bug because the buggy routine got removed at the > end of the series. This also means there's nothing to fix and backport > in upstream. > > The bug is also more widely spread than simply domain validation, > because every use of scsi_execute in the current stable tree will trip > this. > > I'm not sure what the best fix is. I can certainly come up with a one > line fix for stable adding the missing length in the #define, but it > can't come from upstream as stated above. We could back the two > patches out then do a stable specific fix for the UAS problem (I don't > think we can leave the UAS patch backed out because the problem was > pretty serious). > > What does stable want to do? We want to do whatever is in Linus's tree if at all possible. Or revert anything we applied that we shouldn't have. Either one is fine with us, just let us know what to do here. thanks, greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <yq1wmp9pb0d.fsf@ca-mkp.ca.oracle.com>]
[parent not found: <b3df77f6-2928-46cd-a7ee-f806d4c937d1@bell.net>]
* Re: Broken Domain Validation in 6.1.84+ [not found] ` <b3df77f6-2928-46cd-a7ee-f806d4c937d1@bell.net> @ 2024-04-08 17:19 ` Martin K. Petersen 2024-04-11 4:40 ` Cyril Brulebois 2024-04-11 7:29 ` Greg KH 0 siblings, 2 replies; 5+ messages in thread From: Martin K. Petersen @ 2024-04-08 17:19 UTC (permalink / raw) To: John David Anglin Cc: Martin K. Petersen, James Bottomley, Bart Van Assche, linux-parisc, linux-scsi, Greg KH, stable Dave, >> Could you please try the patch below on top of v6.1.80? > Works okay on top of v6.1.80: > > [ 30.952668] scsi 6:0:0:0: Direct-Access HP 73.4G ST373207LW HPC1 PQ: 0 ANSI: 3 > [ 31.072592] scsi target6:0:0: Beginning Domain Validation > [ 31.139334] scsi 6:0:0:0: Power-on or device reset occurred > [ 31.186227] scsi target6:0:0: Ending Domain Validation > [ 31.240482] scsi target6:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63) > [ 31.462587] ata5: SATA link down (SStatus 0 SControl 0) > [ 31.618798] scsi 6:0:2:0: Direct-Access HP 73.4G ST373207LW HPC1 PQ: 0 ANSI: 3 > [ 31.732588] scsi target6:0:2: Beginning Domain Validation > [ 31.799201] scsi 6:0:2:0: Power-on or device reset occurred > [ 31.846724] scsi target6:0:2: Ending Domain Validation > [ 31.900822] scsi target6:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63) Great, thanks for testing! Greg, please revert the following commits from linux-6.1.y: b73dd5f99972 ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") cf33e6ca12d8 ("scsi: core: Add struct for args to execution functions") and include the patch below instead. Thank you! -- Martin K. Petersen Oracle Linux Engineering From 87441914d491c01b73b949663c101056a9d9b8c7 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" <martin.petersen@oracle.com> Date: Tue, 13 Feb 2024 09:33:06 -0500 Subject: [PATCH] scsi: sd: usb_storage: uas: Access media prior to querying device properties [ Upstream commit 321da3dc1f3c92a12e3c5da934090d2992a8814c ] It has been observed that some USB/UAS devices return generic properties hardcoded in firmware for mode pages for a period of time after a device has been discovered. The reported properties are either garbage or they do not accurately reflect the characteristics of the physical storage device attached in the case of a bridge. Prior to commit 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to avoid calling revalidate twice") we would call revalidate several times during device discovery. As a result, incorrect values would eventually get replaced with ones accurately describing the attached storage. When we did away with the redundant revalidate pass, several cases were reported where devices reported nonsensical values or would end up in write-protected state. An initial attempt at addressing this issue involved introducing a delayed second revalidate invocation. However, this approach still left some devices reporting incorrect characteristics. Tasos Sahanidis debugged the problem further and identified that introducing a READ operation prior to MODE SENSE fixed the problem and that it wasn't a timing issue. Issuing a READ appears to cause the devices to update their state to reflect the actual properties of the storage media. Device properties like vendor, model, and storage capacity appear to be correctly reported from the get-go. It is unclear why these devices defer populating the remaining characteristics. Match the behavior of a well known commercial operating system and trigger a READ operation prior to querying device characteristics to force the device to populate the mode pages. The additional READ is triggered by a flag set in the USB storage and UAS drivers. We avoid issuing the READ for other transport classes since some storage devices identify Linux through our particular discovery command sequence. Link: https://lore.kernel.org/r/20240213143306.2194237-1-martin.petersen@oracle.com Fixes: 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to avoid calling revalidate twice") Cc: stable@vger.kernel.org Reported-by: Tasos Sahanidis <tasos@tasossah.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Tasos Sahanidis <tasos@tasossah.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 31b5273f43a7..349b1455a2c6 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3284,6 +3284,24 @@ static bool sd_validate_opt_xfer_size(struct scsi_disk *sdkp, return true; } +static void sd_read_block_zero(struct scsi_disk *sdkp) +{ + unsigned int buf_len = sdkp->device->sector_size; + char *buffer, cmd[10] = { }; + + buffer = kmalloc(buf_len, GFP_KERNEL); + if (!buffer) + return; + + cmd[0] = READ_10; + put_unaligned_be32(0, &cmd[2]); /* Logical block address 0 */ + put_unaligned_be16(1, &cmd[7]); /* Transfer 1 logical block */ + + scsi_execute_req(sdkp->device, cmd, DMA_FROM_DEVICE, buffer, buf_len, + NULL, SD_TIMEOUT, sdkp->max_retries, NULL); + kfree(buffer); +} + /** * sd_revalidate_disk - called the first time a new disk is seen, * performs disk spin up, read_capacity, etc. @@ -3323,7 +3341,13 @@ static int sd_revalidate_disk(struct gendisk *disk) */ if (sdkp->media_present) { sd_read_capacity(sdkp, buffer); - + /* + * Some USB/UAS devices return generic values for mode pages + * until the media has been accessed. Trigger a READ operation + * to force the device to populate mode pages. + */ + if (sdp->read_before_ms) + sd_read_block_zero(sdkp); /* * set the default to rotational. All non-rotational devices * support the block characteristics VPD page, which will diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c index c54e9805da53..12cf9940e5b6 100644 --- a/drivers/usb/storage/scsiglue.c +++ b/drivers/usb/storage/scsiglue.c @@ -179,6 +179,13 @@ static int slave_configure(struct scsi_device *sdev) */ sdev->use_192_bytes_for_3f = 1; + /* + * Some devices report generic values until the media has been + * accessed. Force a READ(10) prior to querying device + * characteristics. + */ + sdev->read_before_ms = 1; + /* * Some devices don't like MODE SENSE with page=0x3f, * which is the command used for checking if a device diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c index de3836412bf3..ed22053b3252 100644 --- a/drivers/usb/storage/uas.c +++ b/drivers/usb/storage/uas.c @@ -878,6 +878,13 @@ static int uas_slave_configure(struct scsi_device *sdev) if (devinfo->flags & US_FL_CAPACITY_HEURISTICS) sdev->guess_capacity = 1; + /* + * Some devices report generic values until the media has been + * accessed. Force a READ(10) prior to querying device + * characteristics. + */ + sdev->read_before_ms = 1; + /* * Some devices don't like MODE SENSE with page=0x3f, * which is the command used for checking if a device diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index d2751ed536df..1504d3137cc6 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -204,6 +204,7 @@ struct scsi_device { unsigned use_10_for_rw:1; /* first try 10-byte read / write */ unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */ unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */ + unsigned read_before_ms:1; /* perform a READ before MODE SENSE */ unsigned no_report_opcodes:1; /* no REPORT SUPPORTED OPERATION CODES */ unsigned no_write_same:1; /* no WRITE SAME command */ unsigned use_16_for_rw:1; /* Use read/write(16) over read/write(10) */ ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Broken Domain Validation in 6.1.84+ 2024-04-08 17:19 ` Martin K. Petersen @ 2024-04-11 4:40 ` Cyril Brulebois 2024-04-11 7:29 ` Greg KH 1 sibling, 0 replies; 5+ messages in thread From: Cyril Brulebois @ 2024-04-11 4:40 UTC (permalink / raw) To: Martin K. Petersen Cc: John David Anglin, James Bottomley, Bart Van Assche, linux-parisc, linux-scsi, Greg KH, stable [-- Attachment #1: Type: text/plain, Size: 1271 bytes --] Hi, Martin K. Petersen <martin.petersen@oracle.com> (2024-04-08): > Great, thanks for testing! > > Greg, please revert the following commits from linux-6.1.y: > > b73dd5f99972 ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") > cf33e6ca12d8 ("scsi: core: Add struct for args to execution functions") > > and include the patch below instead. According to James I've ran into another expression of the same issue, which in my case led to the loss of some SMART information: https://lore.kernel.org/stable/20240410193207.qnb75osxuk4ovvm6@mraw.org/ https://lore.kernel.org/stable/0655cea93e52928d3e4a12b4fe2d2a4375492ed3.camel@linux.ibm.com/ I've just confirmed that both reverts plus that patch, applied on top of v6.1.85, fix that regression for me. That's tested in a QEMU VM with a SATA disk (that exposes only a few SMART attributes anyway) and also on baremetal with real disks (2 pairs of Seagate IronWolf): smartctl returns data again, including temperatures. Closes: https://bugs.debian.org/1068675 Tested-by: Cyril Brulebois <kibi@debian.org> Cheers, -- Cyril Brulebois (kibi@debian.org) <https://debamax.com/> D-I release manager -- Release team member -- Freelance Consultant [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broken Domain Validation in 6.1.84+ 2024-04-08 17:19 ` Martin K. Petersen 2024-04-11 4:40 ` Cyril Brulebois @ 2024-04-11 7:29 ` Greg KH 1 sibling, 0 replies; 5+ messages in thread From: Greg KH @ 2024-04-11 7:29 UTC (permalink / raw) To: Martin K. Petersen Cc: John David Anglin, James Bottomley, Bart Van Assche, linux-parisc, linux-scsi, stable On Mon, Apr 08, 2024 at 01:19:50PM -0400, Martin K. Petersen wrote: > > Dave, > > >> Could you please try the patch below on top of v6.1.80? > > Works okay on top of v6.1.80: > > > > [ 30.952668] scsi 6:0:0:0: Direct-Access HP 73.4G ST373207LW HPC1 PQ: 0 ANSI: 3 > > [ 31.072592] scsi target6:0:0: Beginning Domain Validation > > [ 31.139334] scsi 6:0:0:0: Power-on or device reset occurred > > [ 31.186227] scsi target6:0:0: Ending Domain Validation > > [ 31.240482] scsi target6:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63) > > [ 31.462587] ata5: SATA link down (SStatus 0 SControl 0) > > [ 31.618798] scsi 6:0:2:0: Direct-Access HP 73.4G ST373207LW HPC1 PQ: 0 ANSI: 3 > > [ 31.732588] scsi target6:0:2: Beginning Domain Validation > > [ 31.799201] scsi 6:0:2:0: Power-on or device reset occurred > > [ 31.846724] scsi target6:0:2: Ending Domain Validation > > [ 31.900822] scsi target6:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63) > > Great, thanks for testing! > > Greg, please revert the following commits from linux-6.1.y: > > b73dd5f99972 ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") > cf33e6ca12d8 ("scsi: core: Add struct for args to execution functions") > > and include the patch below instead. Now done, thanks! greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-04-11 7:29 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <b0670b6f-b7f7-4212-9802-7773dcd7206e@bell.net>
[not found] ` <d1fc0b8d-4858-4234-8b66-c8980f612ea2@acm.org>
[not found] ` <db784080-2268-4e6d-84bd-b33055a3331b@bell.net>
[not found] ` <028352c6-7e34-4267-bbff-10c93d3596d3@acm.org>
[not found] ` <cf78b204-9149-4462-8e82-b8f98859004b@bell.net>
[not found] ` <6cb06622e6add6309e8dbb9a8944d53d1b9c4aaa.camel@HansenPartnership.com>
[not found] ` <03ef7afd-98f5-4f1b-8330-329f47139ddf@bell.net>
2024-04-06 18:51 ` Broken Domain Validation in 6.1.84+ James Bottomley
2024-04-07 6:16 ` Greg KH
[not found] ` <yq1wmp9pb0d.fsf@ca-mkp.ca.oracle.com>
[not found] ` <b3df77f6-2928-46cd-a7ee-f806d4c937d1@bell.net>
2024-04-08 17:19 ` Martin K. Petersen
2024-04-11 4:40 ` Cyril Brulebois
2024-04-11 7:29 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox