All of lore.kernel.org
 help / color / mirror / Atom feed
From: Diangang Li <lidiangang@bytedance.com>
To: Friedrich Weber <f.weber@proxmox.com>
Cc: Niklas Cassel <nks@flawful.org>, Jens Axboe <axboe@kernel.dk>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Christoph Hellwig <hch@lst.de>, Hannes Reinecke <hare@suse.de>,
	Damien Le Moal <dlemoal@kernel.org>,
	linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
	linux-block@vger.kernel.org,
	Niklas Cassel <niklas.cassel@wdc.com>,
	Mira Limbeck <m.limbeck@proxmox.com>
Subject: Re: [PATCH v7 08/19] scsi: detect support for command duration limits
Date: Thu, 31 Jul 2025 19:38:06 +0800	[thread overview]
Message-ID: <20250731113806.GA93929@bytedance.com> (raw)
In-Reply-To: <3dee186c-285e-4c1c-b879-6445eb2f3edf@proxmox.com>

On Wed, Apr 30, 2025 at 02:13:53PM +0200, Friedrich Weber wrote:
> Hi,
> 
> One of our users reports that, in their setup, hotplugging new disks doesn't
> work anymore with recent kernels (details below). The issue appeared somewhere
> between kernels 6.4 and 6.5, and they bisected the change to this patch:

Hi Friedrich,

I would like to confirm the hotplugging method used here. Is it the logical operation using the following commands:

- echo 1 > /sys/block/sdX/device/delete
- echo - - - > /sys/class/scsi_host/host5/scan

or does it refer to physical hotplugging (physically removing and reinserting the drive)?

I have tested both the 3008 and 9500 HBAs using the delete and scan method, and both worked fine.

> 
>   624885209f31 (scsi: core: Detect support for command duration limits)
> 
> The issue is also reproducible on a mainline kernel 6.14.4 build from [1]. When
> hotplugging a disk under 6.14.4, the following is logged (I've redacted some
> identifiers, let me know in case I've been too overzealous with that):
> 
> Apr 28 16:41:13 pbs-disklab kernel: mpt3sas_cm0: handle(0xa) sas_address(0xREDACTED_SAS_ADDR) port_type(0x1)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: Direct-Access     WDC      REDACTED_SN  C5C0 PQ: 0 ANSI: 7
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: SSP: handle(0x000a), sas_addr(0xREDACTED_SAS_ADDR), phy(2), device_name(REDACTED_DEVICE_NAME)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: enclosure logical id (REDACTED_LOGICAL_ID), slot(0) 
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: enclosure level(0x0000), connector name(     )
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: qdepth(254), tagged(1), scsi_level(8), cmd_que(1)
> Apr 28 16:41:13 pbs-disklab kernel: scsi 5:0:1:0: Power-on or device reset occurred
> Apr 28 16:41:16 pbs-disklab kernel: mpt3sas_cm0: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: log_info(0x31130000): originator(PL), code(0x13), sub_code(0x0000)
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: Attached scsi generic sg1 type 0
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Read Capacity(16) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Sense not available.
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Sense not available.
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] 0-byte physical blocks
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Test WP failed, assume Write Enabled
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Asking for cache data failed
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Assuming drive cache: write through
> Apr 28 16:41:18 pbs-disklab kernel:  end_device-5:1: add: handle(0x000a), sas_addr(0xREDACTED_SAS_ADDR)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: handle(0x000a), ioc_status(0x0022) failure at drivers/scsi/mpt3sas/mpt3sas_transport.c:225/_transport_set_identify()!
> Apr 28 16:41:18 pbs-disklab kernel: sd 5:0:1:0: [sdb] Attached SCSI disk
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0xREDACTED_SAS_ADDR)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: removing handle(0x000a), sas_addr(0xREDACTED_SAS_ADDR)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: enclosure logical id(REDACTED_LOGICAL_ID), slot(0)
> Apr 28 16:41:18 pbs-disklab kernel: mpt3sas_cm0: enclosure level(0x0000), connector name(     )
> 
> and the block device isn't accessible afterwards. It does seem to be visible
> after a reboot.
> 
> lspci on this host shows:
> 
> 02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
> 	Subsystem: Broadcom / LSI SAS9300-8i [1000:30e0]
> 	Kernel driver in use: mpt3sas
> 	Kernel modules: mpt3sas
> 
> The HBA is placed on a PCIe 3.0 x8 slot (not bifurcated) and connected via
> SFF-8643 to a simple 2U 12xLFF SAS3 Supermicro box. The user can also reproduce
> the issue with other HBAs with e.g. the SAS3108 and SAS3816 chipsets.
> 
> The device doesn't seem to support CDL. So if I see correctly, the only
> effective change introduced by the patch are the four scsi_cdl_check_cmd (and
> thus scsi_report_opcode) calls to check for CDL support. Hence we wondered
> whether may be the cause of the issue. We ran a few tests to verify:
> 
> - disabling "REPORT SUPPORTED OPERATION CODES" by passing
>   `scsi_mod.dev_flags=WDC:REDACTED_SN:536870912` (the flag being
>   BLIST_NO_RSOC) resolves the issue (hotplug works again), but I imagine
>   disabling RSOC altogether isn't a good workaround. This test was not done
>   on a mainline kernel, but I don't think it would make a difference.
> 
> - we patched out the four calls to scsi_cdl_check_cmd and unconditionally set
>   cdl_supported to 0, see [2] for the patch (on top of 6.14.4). This resolves
>   the issue.
> 
> - I suspected that particularly the two latter scsi_cdl_check_cmd calls with a
>   nonzero service action might be problematic, so we patched them out
>   specifically but kept the other two calls without a service action, see [3]
>   for the patch (on top of 6.14.4). But with this patch, hotplug still does
>   not work.
> 
> - the RSOC commands themselves don't seem to be problematic per se. We asked
>   the user to boot a (non-mainline) kernel with the `scsi_mod.dev_flags`
>   parameter to disable RSOC as above, hotplug the disk (this succeeds), and
>   then query the four opcodes/service actions using `sg_opcodes`, and this
>   looks okay [4] (reporting that CDL is not supported).
> 
> I wonder whether these results might suggest the RSOC queries are problematic
> not in general, but at this particular point (during device initialization) in
> this particular hardware setup? If this turns out to be the case -- would it be
> feasible to suppress these RSOC queries if CDL is not enabled via sysfs?
> 
> If you have any ideas for further troubleshooting, we're happy to gather more
> data. I'll be AFK for a few weeks, but Mira (in CC) will take over in the
> meantime.
> 
> Thanks!
> 
> Friedrich
> 
> [1] https://kernel.ubuntu.com/mainline/v6.14.4/
> 
> [2]
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index a77e0499b738..022b2f9706a4 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -658,11 +658,7 @@ void scsi_cdl_check(struct scsi_device *sdev)
>         }
> 
>         /* Check support for READ_16, WRITE_16, READ_32 and WRITE_32 commands */
> -       cdl_supported =
> -               scsi_cdl_check_cmd(sdev, READ_16, 0, buf) ||
> -               scsi_cdl_check_cmd(sdev, WRITE_16, 0, buf) ||
> -               scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, READ_32, buf) ||
> -               scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, WRITE_32, buf);
> +       cdl_supported = 0;
>         if (cdl_supported) {
>                 /*
>                  * We have CDL support: force the use of READ16/WRITE16.
> 
> [3]
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index a77e0499b738..6b0f36f5415e 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -660,9 +660,8 @@ void scsi_cdl_check(struct scsi_device *sdev)
>         /* Check support for READ_16, WRITE_16, READ_32 and WRITE_32 commands */
>         cdl_supported =
>                 scsi_cdl_check_cmd(sdev, READ_16, 0, buf) ||
> -               scsi_cdl_check_cmd(sdev, WRITE_16, 0, buf) ||
> -               scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, READ_32, buf) ||
> -               scsi_cdl_check_cmd(sdev, VARIABLE_LENGTH_CMD, WRITE_32, buf);
> +               scsi_cdl_check_cmd(sdev, WRITE_16, 0, buf);
> +       cdl_supported = 0;
>         if (cdl_supported) {
>                 /*
>                  * We have CDL support: force the use of READ16/WRITE16.
> 
> [4]
> 
> root@pbs-disklab:~# sg_opcodes -o 0x88 /dev/sdb
> 
> Opcode=0x88
> Command_name: Read(16)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 88 fe ff ff ff ff ff ff ff ff ff ff ff ff 00 00
> 
> root@pbs-disklab:~# sg_opcodes -o 0x8a /dev/sdb
> 
> Opcode=0x8a
> Command_name: Write(16)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 8a fa ff ff ff ff ff ff ff ff ff ff ff ff 00 00
> 
> root@pbs-disklab:~# sg_opcodes -o 0x7f,0x9 /dev/sdb
> 
> Opcode=0x7f  Service_action=0x0009
> Command_name: Read(32)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 7f 00 00 00 00 00 00 ff 00 09 fe 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 
> root@pbs-disklab:~# sg_opcodes -o 0x7f,0xb /dev/sdb
> 
> Opcode=0x7f  Service_action=0x000b
> Command_name: Write(32)
> Command is supported [conforming to SCSI standard]
> No command duration limit mode page
> Multiple Logical Units (MLU): not reported
> Usage data: 7f 00 00 00 00 00 00 ff 00 0b fa 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 

  parent reply	other threads:[~2025-07-31 11:38 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11  1:13 [PATCH v7 00/19] Add Command Duration Limits support Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 01/19] ioprio: cleanup interface definition Niklas Cassel
2023-06-07 13:10   ` [PATCH v7 1/19] " Alexander Gordeev
2023-06-07 14:52     ` Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 02/19] block: introduce ioprio hints Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 03/19] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 04/19] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 05/19] scsi: rename and move get_scsi_ml_byte() Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 06/19] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 07/19] scsi: support service action in scsi_report_opcode() Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 08/19] scsi: detect support for command duration limits Niklas Cassel
2025-04-30 12:13   ` Friedrich Weber
2025-04-30 13:39     ` Damien Le Moal
2025-05-08  9:36       ` Mira Limbeck
2025-05-08 23:45         ` Damien Le Moal
2025-06-03 11:28           ` Friedrich Weber
2025-06-09 12:24             ` Damien Le Moal
2025-07-10  8:41               ` Friedrich Weber
2025-07-10 10:32                 ` Damien Le Moal
2025-07-30 10:39                   ` Friedrich Weber
2025-07-14  2:48                 ` Damien Le Moal
2025-07-22  9:32                   ` Friedrich Weber
2025-07-22  9:37                     ` Damien Le Moal
2025-07-31 11:48                       ` Diangang Li
2025-07-31 12:06                         ` Friedrich Weber
2025-07-31 23:21                         ` Damien Le Moal
2025-09-18 12:46                           ` Friedrich Weber
2025-07-31 11:38     ` Diangang Li [this message]
2025-07-31 11:44       ` Friedrich Weber
2023-05-11  1:13 ` [PATCH v7 09/19] scsi: allow enabling and disabling " Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 10/19] scsi: sd: set read/write commands CDL index Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 11/19] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 12/19] ata: libata-scsi: remove unnecessary !cmd checks Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 13/19] ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 14/19] ata: libata: detect support for command duration limits Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 15/19] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 16/19] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 17/19] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 18/19] ata: libata: set read/write commands CDL index Niklas Cassel
2023-05-11  1:13 ` [PATCH v7 19/19] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2023-05-11  4:22 ` [PATCH v7 00/19] Add Command Duration Limits support Douglas Gilbert
2023-05-11 12:34   ` Damien Le Moal
2023-05-15 22:58 ` Damien Le Moal
2023-05-22 21:41 ` Martin K. Petersen
2023-05-22 23:12   ` Damien Le Moal
2023-05-23  9:56   ` Niklas Cassel
2023-05-23 10:08     ` Damien Le Moal
2023-05-23 10:35       ` Niklas Cassel
2023-05-23 10:53         ` Damien Le Moal
2023-06-01  0:43 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250731113806.GA93929@bytedance.com \
    --to=lidiangang@bytedance.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=dlemoal@kernel.org \
    --cc=f.weber@proxmox.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jejb@linux.ibm.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=m.limbeck@proxmox.com \
    --cc=martin.petersen@oracle.com \
    --cc=niklas.cassel@wdc.com \
    --cc=nks@flawful.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.