linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Diangang Li <lidiangang@bytedance.com>
To: Friedrich Weber <f.weber@proxmox.com>
Cc: Niklas Cassel <nks@flawful.org>, Jens Axboe <axboe@kernel.dk>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Christoph Hellwig <hch@lst.de>, Hannes Reinecke <hare@suse.de>,
	Damien Le Moal <dlemoal@kernel.org>,
	linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
	linux-block@vger.kernel.org,
	Niklas Cassel <niklas.cassel@wdc.com>,
	Mira Limbeck <m.limbeck@proxmox.com>
Subject: Re: [PATCH v7 08/19] scsi: detect support for command duration limits
Date: Thu, 31 Jul 2025 19:38:06 +0800	[thread overview]
Message-ID: <20250731113806.GA93929@bytedance.com> (raw)
In-Reply-To: <3dee186c-285e-4c1c-b879-6445eb2f3edf@proxmox.com>

On Wed, Apr 30, 2025 at 02:13:53PM +0200, Friedrich Weber wrote:
> Hi,
> 
> One of our users reports that, in their setup, hotplugging new disks doesn't
> work anymore with recent kernels (details below). The issue appeared somewhere
> between kernels 6.4 and 6.5, and they bisected the change to this patch:

Hi Friedrich,

I would like to confirm the hotplugging method used here. Is it the logical operation using the following commands:

- echo 1 > /sys/block/sdX/device/delete
- echo - - - > /sys/class/scsi_host/host5/scan

or does it refer to physical hotplugging (physically removing and reinserting the drive)?

I have tested both the 3008 and 9500 HBAs using the delete and scan method, and both worked fine.

> 
>   624885209f31 (scsi: core: Detect support for command duration limits)
> 
> The issue is also reproducible on a mainline kernel 6.14.4 build from [1]. When
> hotplugging a disk under 6.14.4, the following is logged (I've redacted some
> identifiers, let me know in case I've been too overzealous with that):
> 
> Apr 28 16:41:13 pbs-disklab kernel: mpt3sas_cm0: handle(0xa) sas_address(0xREDACTED_SAS_ADDR) port_type(0x1)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: Direct-Access     WDC      REDACTED_SN  C5C0 PQ: 0 ANSI: 7
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: SSP: handle(0x000a), sas_addr(0xREDACTED_SAS_ADDR), phy(2), device_name(REDACTED_DEVICE_NAME)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: enclosure logical id (REDACTED_LOGICAL_ID), slot(0) 
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: enclosure level(0x0000), connector name(     )
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: qdepth(254), tagged(1), scsi_level(8), cmd_que(1)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: Power-on or device reset occurred
> Apr 28 16:41:16 pbs-disklab kernel: mpt3sas_cm0: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: log_info(0x31130000): originator(PL), code(0x13), sub_code(0x0000)
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: Attached scsi generic sg1 type 0
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Read Capacity(16) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Sense not available.
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Sense not available.
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] 0-byte physical blocks
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Test WP failed, assume Write Enabled
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Asking for cache data failed
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Assuming drive cache: write through
> Apr 28 16:41:18 pbs-disklab kernel:  end_device-5:1: add: handle(0x000a), sas_addr(0xREDACTED_SAS_ADDR)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: handle(0x000a), ioc_status(0x0022) failure at drivers/scsi/mpt3sas/mpt3sas_transport.c:225/_transport_set_identify()!
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Attached SCSI disk
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0xREDACTED_SAS_ADDR)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: removing handle(0x000a), sas_addr(0xREDACTED_SAS_ADDR)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: enclosure logical id(REDACTED_LOGICAL_ID), slot(0)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: enclosure level(0x0000), connector name(     )
> 
> and the block device isn't accessible afterwards. It does seem to be visible
> after a reboot.
> 
> lspci on this host shows:
> 
> 02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
> 	Subsystem: Broadcom / LSI SAS9300-8i [1000:30e0]
> 	Kernel driver in use: mpt3sas
> 	Kernel modules: mpt3sas
> 
> The HBA is placed on a PCIe 3.0 x8 slot (not bifurcated) and connected via
> SFF-8643 to a simple 2U 12xLFF SAS3 Supermicro box. The user can also reproduce
> the issue with other HBAs with e.g. the SAS3108 and SAS3816 chipsets.
> 
> The device doesn't seem to support CDL. So if I see correctly, the only
> effective change introduced by the patch are the four scsi_cdl_check_cmd (and
> thus scsi_report_opcode) calls to check for CDL support. Hence we wondered
> whether may be the cause of the issue. We ran a few tests to verify:
> 
> - disabling "REPORT SUPPORTED OPERATION CODES" by passing
>   `scsi_mod.dev_flags=WDC:REDACTED_SN:536870912` (the flag being
>   BLIST_NO_RSOC) resolves the issue (hotplug works again), but I imagine
>   disabling RSOC altogether isn't a good workaround. This test was not done
>   on a mainline kernel, but I don't think it would make a difference.
> 
> - we patched out the four calls to scsi_cdl_check_cmd and unconditionally set
>   cdl_supported to 0, see [2] for the patch (on top of 6.14.4). This resolves
>   the issue.
> 
> - I suspected that particularly the two latter scsi_cdl_check_cmd calls with a
>   nonzero service action might be problematic, so we patched them out
>   specifically but kept the other two calls without a service action, see [3]
>   for the patch (on top of 6.14.4). But with this patch, hotplug still does
>   not work.
> 
> - the RSOC commands themselves don't seem to be problematic per se. We asked
>   the user to boot a (non-mainline) kernel with the `scsi_mod.dev_flags`
>   parameter to disable RSOC as above, hotplug the disk (this succeeds), and
>   then query the four opcodes/service actions using `sg_opcodes`, and this
>   looks okay [4] (reporting that CDL is not supported).
> 
> I wonder whether these results might suggest the RSOC queries are problematic
> not in general, but at this particular point (during device initialization) in
> this particular hardware setup? If this turns out to be the case -- would it be
> feasible to suppress these RSOC queries if CDL is not enabled via sysfs?
> 
> If you have any ideas for further troubleshooting, we're happy to gather more
> data. I'll be AFK for a few weeks, but Mira (in CC) will take over in the
> meantime.
> 
> Thanks!
> 
> Friedrich
> 
> [1] https://kernel.ubuntu.com/mainline/v6.14.4/
> 
> [2]
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index a77e0499b738..022b2f9706a4 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -658,11 +658,7 @@ void scsi_cdl_check(struct scsi_device *sdev)
>         }
> 
>         /* Check support for READ_16, WRITE_16, READ_32 and WRITE_32 commands */
> -       cdl_supported =
> -               scsi_cdl_check_cmd(sdev, READ_16, 0, buf) ||
> -               scsi_cdl_check_cmd(sdev, WRITE_16, 0, buf) ||
> -               scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, READ_32, buf) ||
> -               scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, WRITE_32, buf);
> +       cdl_supported = 0;
>         if (cdl_supported) {
>                 /*
>                  * We have CDL support: force the use of READ16/WRITE16.
> 
> [3]
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index a77e0499b738..6b0f36f5415e 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -660,9 +660,8 @@ void scsi_cdl_check(struct scsi_device *sdev)
>         /* Check support for READ_16, WRITE_16, READ_32 and WRITE_32 commands */
>         cdl_supported =
>                 scsi_cdl_check_cmd(sdev, READ_16, 0, buf) ||
> -               scsi_cdl_check_cmd(sdev, WRITE_16, 0, buf) ||
> -               scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, READ_32, buf) ||
> -               scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, WRITE_32, buf);
> +               scsi_cdl_check_cmd(sdev, WRITE_16, 0, buf);
> +       cdl_supported = 0;
>         if (cdl_supported) {
>                 /*
>                  * We have CDL support: force the use of READ16/WRITE16.
> 
> [4]
> 
> root@pbs-disklab:~# sg_opcodes -o 0x88 /dev/sdb
> 
> Opcode=0x88
> Command_name: Read(16)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 88 fe ff ff ff ff ff ff ff ff ff ff ff ff 00 00
> 
> root@pbs-disklab:~# sg_opcodes -o 0x8a /dev/sdb
> 
> Opcode=0x8a
> Command_name: Write(16)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 8a fa ff ff ff ff ff ff ff ff ff ff ff ff 00 00
> 
> root@pbs-disklab:~# sg_opcodes -o 0x7f,0x9 /dev/sdb
> 
> Opcode=0x7f  Service_action=0x0009
> Command_name: Read(32)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 7f 00 00 00 00 00 00 ff 00 09 fe 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 
> root@pbs-disklab:~# sg_opcodes -o 0x7f,0xb /dev/sdb
> 
> Opcode=0x7f  Service_action=0x000b
> Command_name: Write(32)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 7f 00 00 00 00 00 00 ff 00 0b fa 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 

  parent reply	other threads:[~2025-07-31 11:38 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11  1:13 [PATCH v7 00/19] Add Command Duration Limits support Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 01/19] ioprio: cleanup interface definition Niklas Cassel
2023-06-07 13:10   ` [PATCH v7 1/19] " Alexander Gordeev
2023-06-07 14:52     ` Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 02/19] block: introduce ioprio hints Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 03/19] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 04/19] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 05/19] scsi: rename and move get_scsi_ml_byte() Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 06/19] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 07/19] scsi: support service action in scsi_report_opcode() Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 08/19] scsi: detect support for command duration limits Niklas Cassel
2025-04-30 12:13   ` Friedrich Weber
2025-04-30 13:39     ` Damien Le Moal
2025-05-08  9:36       ` Mira Limbeck
2025-05-08 23:45         ` Damien Le Moal
2025-06-03 11:28           ` Friedrich Weber
2025-06-09 12:24             ` Damien Le Moal
2025-07-10  8:41               ` Friedrich Weber
2025-07-10 10:32                 ` Damien Le Moal
2025-07-30 10:39                   ` Friedrich Weber
2025-07-14  2:48                 ` Damien Le Moal
2025-07-22  9:32                   ` Friedrich Weber
2025-07-22  9:37                     ` Damien Le Moal
2025-07-31 11:48                       ` Diangang Li
2025-07-31 12:06                         ` Friedrich Weber
2025-07-31 23:21                         ` Damien Le Moal
2025-07-31 11:38     ` Diangang Li [this message]
2025-07-31 11:44       ` Friedrich Weber
2023-05-11  1:13 ` [PATCH v7 09/19] scsi: allow enabling and disabling " Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 10/19] scsi: sd: set read/write commands CDL index Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 11/19] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 12/19] ata: libata-scsi: remove unnecessary !cmd checks Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 13/19] ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 14/19] ata: libata: detect support for command duration limits Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 15/19] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 16/19] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 17/19] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 18/19] ata: libata: set read/write commands CDL index Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 19/19] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2023-05-11  4:22 ` [PATCH v7 00/19] Add Command Duration Limits support Douglas Gilbert
2023-05-11 12:34   ` Damien Le Moal
2023-05-15 22:58 ` Damien Le Moal
2023-05-22 21:41 ` Martin K. Petersen
2023-05-22 23:12   ` Damien Le Moal
2023-05-23  9:56   ` Niklas Cassel
2023-05-23 10:08     ` Damien Le Moal
2023-05-23 10:35       ` Niklas Cassel
2023-05-23 10:53         ` Damien Le Moal
2023-06-01  0:43 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250731113806.GA93929@bytedance.com \
    --to=lidiangang@bytedance.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=dlemoal@kernel.org \
    --cc=f.weber@proxmox.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jejb@linux.ibm.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=m.limbeck@proxmox.com \
    --cc=martin.petersen@oracle.com \
    --cc=niklas.cassel@wdc.com \
    --cc=nks@flawful.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).