From: Damien Le Moal <dlemoal@kernel.org>
To: Friedrich Weber <f.weber@proxmox.com>,
Niklas Cassel <nks@flawful.org>, Jens Axboe <axboe@kernel.dk>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
Christoph Hellwig <hch@lst.de>, Hannes Reinecke <hare@suse.de>,
linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
linux-block@vger.kernel.org,
Niklas Cassel <niklas.cassel@wdc.com>,
Mira Limbeck <m.limbeck@proxmox.com>
Subject: Re: [PATCH v7 08/19] scsi: detect support for command duration limits
Date: Wed, 30 Apr 2025 08:39:20 -0500 [thread overview]
Message-ID: <6fb8499a-b5bc-4d41-bf37-32ebdea43e9a@kernel.org> (raw)
In-Reply-To: <3dee186c-285e-4c1c-b879-6445eb2f3edf@proxmox.com>
On 2025/04/30 7:13, Friedrich Weber wrote:
> Hi,
>
> One of our users reports that, in their setup, hotplugging new disks doesn't
> work anymore with recent kernels (details below). The issue appeared somewhere
> between kernels 6.4 and 6.5, and they bisected the change to this patch:
>
> 624885209f31 (scsi: core: Detect support for command duration limits)
>
> The issue is also reproducible on a mainline kernel 6.14.4 build from [1]. When
> hotplugging a disk under 6.14.4, the following is logged (I've redacted some
> identifiers, let me know in case I've been too overzealous with that):
>
> Apr 28 16:41:13 pbs-disklab kernel: mpt3sas_cm0: handle(0xa) sas_address(0xREDACTED_SAS_ADDR) port_type(0x1)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: Direct-Access WDC REDACTED_SN C5C0 PQ: 0 ANSI: 7
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: SSP: handle(0x000a), sas_addr(0xREDACTED_SAS_ADDR), phy(2), device_name(REDACTED_DEVICE_NAME)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: enclosure logical id (REDACTED_LOGICAL_ID), slot(0)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: enclosure level(0x0000), connector name( )
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: qdepth(254), tagged(1), scsi_level(8), cmd_que(1)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: Power-on or device reset occurred
> Apr 28 16:41:16 pbs-disklab kernel: mpt3sas_cm0: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05)
This decodes to:
Code: 00110000h PL_LOGINFO_CODE_RESET See Sub-Codes below (PL_LOGINFO_SUB_CODE)
Sub Code: 00000E00h PL_LOGINFO_SUB_CODE_DISCOVERY_SATA_ERR
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: log_info(0x31130000): originator(PL), code(0x13), sub_code(0x0000)
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: Attached scsi generic sg1 type 0
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Read Capacity(16) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Sense not available.
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Sense not available.
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] 0-byte physical blocks
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Test WP failed, assume Write Enabled
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Asking for cache data failed
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Assuming drive cache: write through
> Apr 28 16:41:18 pbs-disklab kernel: end_device-5:1: add: handle(0x000a), sas_addr(0xREDACTED_SAS_ADDR)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: handle(0x000a), ioc_status(0x0022) failure at drivers/scsi/mpt3sas/mpt3sas_transport.c:225/_transport_set_identify()!
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Attached SCSI disk
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0xREDACTED_SAS_ADDR)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: removing handle(0x000a), sas_addr(0xREDACTED_SAS_ADDR)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: enclosure logical id(REDACTED_LOGICAL_ID), slot(0)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: enclosure level(0x0000), connector name( )
>
> and the block device isn't accessible afterwards. It does seem to be visible
> after a reboot.
>
> lspci on this host shows:
>
> 02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
> Subsystem: Broadcom / LSI SAS9300-8i [1000:30e0]
> Kernel driver in use: mpt3sas
> Kernel modules: mpt3sas
>
> The HBA is placed on a PCIe 3.0 x8 slot (not bifurcated) and connected via
> SFF-8643 to a simple 2U 12xLFF SAS3 Supermicro box. The user can also reproduce
> the issue with other HBAs with e.g. the SAS3108 and SAS3816 chipsets.
>
> The device doesn't seem to support CDL. So if I see correctly, the only
> effective change introduced by the patch are the four scsi_cdl_check_cmd (and
> thus scsi_report_opcode) calls to check for CDL support. Hence we wondered
> whether may be the cause of the issue. We ran a few tests to verify:
>
> - disabling "REPORT SUPPORTED OPERATION CODES" by passing
> `scsi_mod.dev_flags=WDC:REDACTED_SN:536870912` (the flag being
> BLIST_NO_RSOC) resolves the issue (hotplug works again), but I imagine
> disabling RSOC altogether isn't a good workaround. This test was not done
> on a mainline kernel, but I don't think it would make a difference.
So it seems that the HBA SAT is choking on the report supported opcode command.
I have several mpt3sas HBAs and I have never seen this issue running the latest
FW version for these (EOL) HBAs. So I am tempted to say that an HBA FW update
should resolve the issue, BUT, I do not recall doing any drive hotplug tests
though. This issue may trigger only with hotplug and not with a cold start...
Can you confirm that ?
>
> - we patched out the four calls to scsi_cdl_check_cmd and unconditionally set
> cdl_supported to 0, see [2] for the patch (on top of 6.14.4). This resolves
> the issue.
>
> - I suspected that particularly the two latter scsi_cdl_check_cmd calls with a
> nonzero service action might be problematic, so we patched them out
> specifically but kept the other two calls without a service action, see [3]
> for the patch (on top of 6.14.4). But with this patch, hotplug still does
> not work.
>
> - the RSOC commands themselves don't seem to be problematic per se. We asked
> the user to boot a (non-mainline) kernel with the `scsi_mod.dev_flags`
> parameter to disable RSOC as above, hotplug the disk (this succeeds), and
> then query the four opcodes/service actions using `sg_opcodes`, and this
> looks okay [4] (reporting that CDL is not supported).
>
> I wonder whether these results might suggest the RSOC queries are problematic
> not in general, but at this particular point (during device initialization) in
> this particular hardware setup? If this turns out to be the case -- would it be
> feasible to suppress these RSOC queries if CDL is not enabled via sysfs?
I would be tempted to say that indeed it is the RSOC command handling in the HBA
SAT that has issues. But your command line checks [4] tend to indicate
otherwise. The issue may trigger only with timing differences with hotplug though.
The other possible problem may be that the RSOC command translation is actually
fine but ends up generating an ATA command that the drive is not happy about,
either because of a drive FW bug or because of the timing the drive receives
that command. Given that this is a WD drive, I can probably check that if you
can send to me the drive model and FW rev (sending that information off-list is
fine).
> If you have any ideas for further troubleshooting, we're happy to gather more
> data. I'll be AFK for a few weeks, but Mira (in CC) will take over in the
> meantime.
Checking the HBA FW version would be a start, and also if you can confirm if
this issue happens only on hotplug or also during cold boot would be nice. I am
traveling right now and will not be able to test hot-plugging drives on my
setups until end of next week.
>
> Thanks!
>
> Friedrich
>
> [1] https://kernel.ubuntu.com/mainline/v6.14.4/
>
> [2]
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index a77e0499b738..022b2f9706a4 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -658,11 +658,7 @@ void scsi_cdl_check(struct scsi_device *sdev)
> }
>
> /* Check support for READ_16, WRITE_16, READ_32 and WRITE_32 commands */
> - cdl_supported =
> - scsi_cdl_check_cmd(sdev, READ_16, 0, buf) ||
> - scsi_cdl_check_cmd(sdev, WRITE_16, 0, buf) ||
> - scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, READ_32, buf) ||
> - scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, WRITE_32, buf);
> + cdl_supported = 0;
> if (cdl_supported) {
> /*
> * We have CDL support: force the use of READ16/WRITE16.
>
> [3]
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index a77e0499b738..6b0f36f5415e 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -660,9 +660,8 @@ void scsi_cdl_check(struct scsi_device *sdev)
> /* Check support for READ_16, WRITE_16, READ_32 and WRITE_32 commands */
> cdl_supported =
> scsi_cdl_check_cmd(sdev, READ_16, 0, buf) ||
> - scsi_cdl_check_cmd(sdev, WRITE_16, 0, buf) ||
> - scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, READ_32, buf) ||
> - scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, WRITE_32, buf);
> + scsi_cdl_check_cmd(sdev, WRITE_16, 0, buf);
> + cdl_supported = 0;
> if (cdl_supported) {
> /*
> * We have CDL support: force the use of READ16/WRITE16.
>
> [4]
>
> root@pbs-disklab:~# sg_opcodes -o 0x88 /dev/sdb
>
> Opcode=0x88
> Command_name: Read(16)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 88 fe ff ff ff ff ff ff ff ff ff ff ff ff 00 00
>
> root@pbs-disklab:~# sg_opcodes -o 0x8a /dev/sdb
>
> Opcode=0x8a
> Command_name: Write(16)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 8a fa ff ff ff ff ff ff ff ff ff ff ff ff 00 00
>
> root@pbs-disklab:~# sg_opcodes -o 0x7f,0x9 /dev/sdb
>
> Opcode=0x7f Service_action=0x0009
> Command_name: Read(32)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 7f 00 00 00 00 00 00 ff 00 09 fe 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> root@pbs-disklab:~# sg_opcodes -o 0x7f,0xb /dev/sdb
>
> Opcode=0x7f Service_action=0x000b
> Command_name: Write(32)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 7f 00 00 00 00 00 00 ff 00 0b fa 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2025-04-30 13:39 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-11 1:13 [PATCH v7 00/19] Add Command Duration Limits support Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 01/19] ioprio: cleanup interface definition Niklas Cassel
2023-06-07 13:10 ` [PATCH v7 1/19] " Alexander Gordeev
2023-06-07 14:52 ` Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 02/19] block: introduce ioprio hints Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 03/19] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 04/19] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 05/19] scsi: rename and move get_scsi_ml_byte() Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 06/19] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 07/19] scsi: support service action in scsi_report_opcode() Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 08/19] scsi: detect support for command duration limits Niklas Cassel
2025-04-30 12:13 ` Friedrich Weber
2025-04-30 13:39 ` Damien Le Moal [this message]
2025-05-08 9:36 ` Mira Limbeck
2025-05-08 23:45 ` Damien Le Moal
2025-06-03 11:28 ` Friedrich Weber
2025-06-09 12:24 ` Damien Le Moal
2025-07-10 8:41 ` Friedrich Weber
2025-07-10 10:32 ` Damien Le Moal
2025-07-30 10:39 ` Friedrich Weber
2025-07-14 2:48 ` Damien Le Moal
2025-07-22 9:32 ` Friedrich Weber
2025-07-22 9:37 ` Damien Le Moal
2025-07-31 11:48 ` Diangang Li
2025-07-31 12:06 ` Friedrich Weber
2025-07-31 23:21 ` Damien Le Moal
2025-07-31 11:38 ` Diangang Li
2025-07-31 11:44 ` Friedrich Weber
2023-05-11 1:13 ` [PATCH v7 09/19] scsi: allow enabling and disabling " Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 10/19] scsi: sd: set read/write commands CDL index Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 11/19] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 12/19] ata: libata-scsi: remove unnecessary !cmd checks Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 13/19] ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 14/19] ata: libata: detect support for command duration limits Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 15/19] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 16/19] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 17/19] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 18/19] ata: libata: set read/write commands CDL index Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 19/19] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2023-05-11 4:22 ` [PATCH v7 00/19] Add Command Duration Limits support Douglas Gilbert
2023-05-11 12:34 ` Damien Le Moal
2023-05-15 22:58 ` Damien Le Moal
2023-05-22 21:41 ` Martin K. Petersen
2023-05-22 23:12 ` Damien Le Moal
2023-05-23 9:56 ` Niklas Cassel
2023-05-23 10:08 ` Damien Le Moal
2023-05-23 10:35 ` Niklas Cassel
2023-05-23 10:53 ` Damien Le Moal
2023-06-01 0:43 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6fb8499a-b5bc-4d41-bf37-32ebdea43e9a@kernel.org \
--to=dlemoal@kernel.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=f.weber@proxmox.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jejb@linux.ibm.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=m.limbeck@proxmox.com \
--cc=martin.petersen@oracle.com \
--cc=niklas.cassel@wdc.com \
--cc=nks@flawful.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).