From: Friedrich Weber <f.weber@proxmox.com>
To: Damien Le Moal <dlemoal@kernel.org>,
Mira Limbeck <m.limbeck@proxmox.com>,
Niklas Cassel <nks@flawful.org>, Jens Axboe <axboe@kernel.dk>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>,
Kashyap Desai <kashyap.desai@broadcom.com>,
Sumit Saxena <sumit.saxena@broadcom.com>,
Shivasharan S <shivasharan.srikanteshwara@broadcom.com>,
Chandrakanth patil <chandrakanth.patil@broadcom.com>,
Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>,
Sreekanth Reddy <sreekanth.reddy@broadcom.com>,
megaraidlinux.pdl@broadcom.com, mpi3mr-linuxdrv.pdl@broadcom.com
Cc: Bart Van Assche <bvanassche@acm.org>,
Christoph Hellwig <hch@lst.de>, Hannes Reinecke <hare@suse.de>,
linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
linux-block@vger.kernel.org,
Niklas Cassel <niklas.cassel@wdc.com>
Subject: Re: [PATCH v7 08/19] scsi: detect support for command duration limits
Date: Thu, 10 Jul 2025 10:41:43 +0200 [thread overview]
Message-ID: <72bf0fd7-f646-46f7-a2aa-ef815dbfa4e2@proxmox.com> (raw)
In-Reply-To: <54e0a717-e9fc-4534-bc27-8bc1ee745048@kernel.org>
Hi Damien,
On 09/06/2025 14:24, Damien Le Moal wrote:
> On 6/3/25 20:28, Friedrich Weber wrote:
>>>> They provided controller information via `sas3ircu` and `storcli`:
>>>>
>>>> sas3ircu:
>>>>
>>>> Controller type : SAS3008
>>>> BIOS version : 8.37.00.00
>>>> Firmware version : 16.00.16.00
>>>
>>> Is this the latest available FW for this HBA ? (see below)
>>
>> It seems 16.00.16.00 is even newer than the latest version available on
>> the Broadcom website, which is a bit strange -- I only found [1] there
>> which has an older 16.00.14.00 (3008_FW_PH16.00.14.00.rar).
>
> So this is an old/now EOL 9300 series HBA, right ? Or is this a 3008 controller
> chip as part of the server motherboard (e.g. a supermicro HBA ?)
> Looking at the Broadcom support page for legacy products, the latest FW version
> seems to be 16.00.10.00.
According to the user it is not part of a server motherboard but a
"proper" PCIe Broadcom SAS 9300-8i HBA.
>
>>>> storcli:
>>>>
>>>> Firmware Package Build = 24.18.0-0021
>>>> Firmware Version = 4.670.00-6500
>>>> CPLD Version = 26515-00A
>>>> Bios Version = 6.34.01.0_4.19.08.00_0x06160200
>>>> HII Version = 03.23.06.00
>>>> Ctrl-R Version = 5.18-0400
>>>> Preboot CLI Version = 01.07-05:#%0000
>>>> NVDATA Version = 3.1611.00-0005
>>>> Boot Block Version = 3.07.00.00-0003
>>>> Driver Name = megaraid_sas
>>>> Driver Version = 07.727.03.00-rc1
>>>
>>> Unfortunately, I do not have any megaraid model so I cannot test/recreate. I
>>> only have mpt3sas (9300, 9400 and 9500 series HBAs) and mpi3mr models (9600 HBA
>>> series).
>>
>> We just realized this is actually the firmware information for a
>> different unrelated controller on the same host (a LSI MegaRAID SAS-3
>> 3108 using the megaraid_sas driver). But the megaraid_sas one is not
>> used in our tests, so please ignore the storcli output we provided.
>> Sorry for the confusion.
>>
>> The controller we're testing with is the SAS3008 I mentioned initially,
>> with firmware version 16.00.16.00 as reported by sas3ircu above.
>
> I do not have this FW... Not sure what the HBA itself is too. I only have some
> Broadcom 9300-XX HBAs that have the 3008 controller.
>
>> FWIW, the user reports they have also seen the same issue with a
>> SAS3-9500-8e Tri-mode HBA.
>
> This one had a FW update last month or so. So checking the latest is required.
Yeah, I agree checking the latest firmware makes sense for these,
unfortunately they are currently in use so the user cannot test with them.
But we might be able to run some tests with a Supermicro
AOC-S3816L-L16iT (so Broadcom SAS3816?) soon where the hotplug issue
apparently also happens. We'll make sure to update to the latest
firmware and I'll do my best to collect relevant logs. If you can think
of anything specific we should collect, feel free to let me know.
>
>>>> And the disk information from `smartctl --xall`
>>>>
>>>> 20T:
>>>>
>>>> === START OF INFORMATION SECTION ===
>>>> Vendor: WDC
>>>> Product: WUH722020BL5204
>
> ...
>
>>>> Product: WUH721818AL5204
>
> I have these. I will try to check. But again, I seriously doubt this has
> anything to do with the drives since these do not support CDL, nor do the HBAs
> you listed. None of then support CDL so calling scsi_report_opcode() for
> checking CDL, we should always see the HBA SAT return "CDL not supported".
>
>
>>> I do not think that the drives are relevant for this issue. How the HBA react
>>> to a command error from the drive resulting from the HBA command translation
>>> likely is the issue.
>>
>> I see, but it is certainly strange that 18T vs 20T drives do seem to
>> make a difference (hotplug works with 18T and doesn't work with 20T).
>
> Probably a timing difference since these drives are not the same generation.
> They have different timing on scan.
Right, this would make sense, especially since cold boot appears to
work, so it sounds plausible that this may be timing-related.
>
>>>> If you need any additional information, please let us know!
>>>
>>> Adding the Broadcom folks to this thread, since as suspected, this seems to be
>>> an HBA issue. I strongly suspect that it relates to a recent very similar issue
>>> I have seen with the mpi3mr driver and a 9600 Broadcom HBA: any hotplug of a
>>> drive would completely crash the HBA and a full power cycle was needed to
>>> recover. A simple reboot would not be sufficient. I think the latest HBA FW
>>> version fixes that problem.
>>>
>>> Broadcom team,
>>>
>>> Any comment ?
>
> Broadcom ? Would you care to comment ?
>
> At this point, I have no idea what is going on. My hunch is that it is the HBA
> SAT misbehaving. But that is only a hunch. To prove it, we would likely need a
> bus trace and have Broadcom look at HBA logs (which can be extracted using
> storecli). All of this likely means involving the technical support of the vendors.
>
Thanks for looking into this, it is definitely a strange problem.
Considering these drives don't support CDL anyway: Do you think it would
be possible to provide an "escape hatch" to disable only the CDL checks
(a module parameter?) so hotplug can work for the user again for their
device? If I see correctly, disabling just the CDL checks is not
possible (without recompiling the kernel) -- scsi_mod.dev_flags can be
used to disable RSOC, but I guess that has other unintended consequences
too, so a more "targeted" escape hatch would be nice.
Best,
Friedrich
next prev parent reply other threads:[~2025-07-10 8:41 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-11 1:13 [PATCH v7 00/19] Add Command Duration Limits support Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 01/19] ioprio: cleanup interface definition Niklas Cassel
2023-06-07 13:10 ` [PATCH v7 1/19] " Alexander Gordeev
2023-06-07 14:52 ` Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 02/19] block: introduce ioprio hints Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 03/19] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 04/19] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 05/19] scsi: rename and move get_scsi_ml_byte() Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 06/19] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 07/19] scsi: support service action in scsi_report_opcode() Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 08/19] scsi: detect support for command duration limits Niklas Cassel
2025-04-30 12:13 ` Friedrich Weber
2025-04-30 13:39 ` Damien Le Moal
2025-05-08 9:36 ` Mira Limbeck
2025-05-08 23:45 ` Damien Le Moal
2025-06-03 11:28 ` Friedrich Weber
2025-06-09 12:24 ` Damien Le Moal
2025-07-10 8:41 ` Friedrich Weber [this message]
2025-07-10 10:32 ` Damien Le Moal
2025-07-30 10:39 ` Friedrich Weber
2025-07-14 2:48 ` Damien Le Moal
2025-07-22 9:32 ` Friedrich Weber
2025-07-22 9:37 ` Damien Le Moal
2025-07-31 11:48 ` Diangang Li
2025-07-31 12:06 ` Friedrich Weber
2025-07-31 23:21 ` Damien Le Moal
2025-07-31 11:38 ` Diangang Li
2025-07-31 11:44 ` Friedrich Weber
2023-05-11 1:13 ` [PATCH v7 09/19] scsi: allow enabling and disabling " Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 10/19] scsi: sd: set read/write commands CDL index Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 11/19] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 12/19] ata: libata-scsi: remove unnecessary !cmd checks Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 13/19] ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 14/19] ata: libata: detect support for command duration limits Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 15/19] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 16/19] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 17/19] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 18/19] ata: libata: set read/write commands CDL index Niklas Cassel
2023-05-11 1:13 ` [PATCH v7 19/19] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2023-05-11 4:22 ` [PATCH v7 00/19] Add Command Duration Limits support Douglas Gilbert
2023-05-11 12:34 ` Damien Le Moal
2023-05-15 22:58 ` Damien Le Moal
2023-05-22 21:41 ` Martin K. Petersen
2023-05-22 23:12 ` Damien Le Moal
2023-05-23 9:56 ` Niklas Cassel
2023-05-23 10:08 ` Damien Le Moal
2023-05-23 10:35 ` Niklas Cassel
2023-05-23 10:53 ` Damien Le Moal
2023-06-01 0:43 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=72bf0fd7-f646-46f7-a2aa-ef815dbfa4e2@proxmox.com \
--to=f.weber@proxmox.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=chandrakanth.patil@broadcom.com \
--cc=dlemoal@kernel.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jejb@linux.ibm.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=m.limbeck@proxmox.com \
--cc=martin.petersen@oracle.com \
--cc=megaraidlinux.pdl@broadcom.com \
--cc=mpi3mr-linuxdrv.pdl@broadcom.com \
--cc=niklas.cassel@wdc.com \
--cc=nks@flawful.org \
--cc=sathya.prakash@broadcom.com \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=sreekanth.reddy@broadcom.com \
--cc=sumit.saxena@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).