* [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
@ 2025-07-17 16:56 Niklas Cassel
2025-07-17 21:20 ` Igor Pylypiv
2025-07-18 4:36 ` Damien Le Moal
0 siblings, 2 replies; 12+ messages in thread
From: Niklas Cassel @ 2025-07-17 16:56 UTC (permalink / raw)
To: Jack Wang, James E.J. Bottomley, Martin K. Petersen
Cc: Igor Pylypiv, Terrence Adams, Damien Le Moal, Niklas Cassel,
linux-scsi
This reverts commit 0f630c58e31afb3dc2373bc1126b555f4b480bb2.
Commit 0f630c58e31a ("scsi: pm80xx: Do not use libsas port ID") causes
drives behind a SAS enclosure (which is connected to one of the ports
on the HBA) to no longer be detected.
Connecting the drives directly to the HBA still works. Thus, the commit
only broke the detection of drives behind a SAS enclosure.
Reverting the commit makes the drives behind the SAS enclosure to be
detected once again.
The commit log of the offending commit is quite vague, but mentions that:
"Remove sas_find_local_port_id(). We can use pm8001_ha->phy[phy_id].port
to get the port ID."
This assumption appears false, thus revert the offending commit.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
dmesg after the offending commit, in case anyone is interested:
pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 2
pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x2
pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 3
pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x3
pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 4
pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x4
pm80xx0:: mpi_hw_event 3417: HW_EVENT_SAS_PHY_UP phyid:0x4 port_id:0x0
sas: phy-11:4 added to port-11:0, phy_mask:0x10 (50015b21409aefbf)
sas: DOING DISCOVERY on port 0, pid:215
pm80xx0:: pm80xx_chip_reg_dev_req 4742: register device req phy_id 0x0 port_id 0x0
pm80xx0:: pm8001_mpi_reg_resp 3309: register device status 0 phy_id 0x0 device_id 16385
sas: executing SMP task failed:-19
sas: RG to ex 50015b21409aefbf failed:0xffffffed
pm80xx0:: pm8001_chip_dereg_dev_req 4226: unregister device device_id 16385
sas: DONE DISCOVERY on port 0, pid:215, result:-19
pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 5
pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x5
pm80xx0:: mpi_hw_event 3417: HW_EVENT_SAS_PHY_UP phyid:0x5 port_id:0x0
sas: phy5 matched wide port0
sas: phy-11:5 added to port-11:0, phy_mask:0x30 (50015b21409aefbf)
sas: DOING DISCOVERY on port 0, pid:277
pm80xx0:: pm80xx_chip_reg_dev_req 4742: register device req phy_id 0x0 port_id 0x0
pm80xx0:: pm8001_mpi_reg_resp 3309: register device status 0 phy_id 0x0 device_id 16386
sas: executing SMP task failed:-19
sas: RG to ex 50015b21409aefbf failed:0xffffffed
pm80xx0:: pm8001_chip_dereg_dev_req 4226: unregister device device_id 16386
sas: DONE DISCOVERY on port 0, pid:277, result:-19
pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 6
pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x6
pm80xx0:: mpi_hw_event 3417: HW_EVENT_SAS_PHY_UP phyid:0x6 port_id:0x0
sas: phy6 matched wide port0
sas: phy-11:6 added to port-11:0, phy_mask:0x70 (50015b21409aefbf)
sas: DOING DISCOVERY on port 0, pid:277
pm80xx0:: pm80xx_chip_reg_dev_req 4742: register device req phy_id 0x0 port_id 0x0
pm80xx0:: pm8001_mpi_reg_resp 3309: register device status 0 phy_id 0x0 device_id 16387
sas: executing SMP task failed:-19
sas: RG to ex 50015b21409aefbf failed:0xffffffed
pm80xx0:: pm8001_chip_dereg_dev_req 4226: unregister device device_id 16387
sas: DONE DISCOVERY on port 0, pid:277, result:-19
pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 7
pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x7
pm80xx0:: mpi_hw_event 3417: HW_EVENT_SAS_PHY_UP phyid:0x7 port_id:0x0
sas: phy7 matched wide port0
sas: phy-11:7 added to port-11:0, phy_mask:0xf0 (50015b21409aefbf)
sas: DOING DISCOVERY on port 0, pid:215
pm80xx0:: pm80xx_chip_reg_dev_req 4742: register device req phy_id 0x0 port_id 0x0
pm80xx0:: pm8001_mpi_reg_resp 3309: register device status 0 phy_id 0x0 device_id 16388
sas: executing SMP task failed:-19
sas: RG to ex 50015b21409aefbf failed:0xffffffed
pm80xx0:: pm8001_chip_dereg_dev_req 4226: unregister device device_id 16388
sas: DONE DISCOVERY on port 0, pid:215, result:-19
drivers/scsi/pm8001/pm8001_sas.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index f7067878b34f..8cfdfb77d9b0 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -431,6 +431,23 @@ static int pm8001_task_prep_ssp(struct pm8001_hba_info *pm8001_ha,
return PM8001_CHIP_DISP->ssp_io_req(pm8001_ha, ccb);
}
+ /* Find the local port id that's attached to this device */
+static int sas_find_local_port_id(struct domain_device *dev)
+{
+ struct domain_device *pdev = dev->parent;
+
+ /* Directly attached device */
+ if (!pdev)
+ return dev->port->id;
+ while (pdev) {
+ struct domain_device *pdev_p = pdev->parent;
+ if (!pdev_p)
+ return pdev->port->id;
+ pdev = pdev->parent;
+ }
+ return 0;
+}
+
#define DEV_IS_GONE(pm8001_dev) \
((!pm8001_dev || (pm8001_dev->dev_type == SAS_PHY_UNUSED)))
@@ -503,10 +520,10 @@ int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
spin_lock_irqsave(&pm8001_ha->lock, flags);
pm8001_dev = dev->lldd_dev;
- port = pm8001_ha->phy[pm8001_dev->attached_phy].port;
+ port = &pm8001_ha->port[sas_find_local_port_id(dev)];
if (!internal_abort &&
- (DEV_IS_GONE(pm8001_dev) || !port || !port->port_attached)) {
+ (DEV_IS_GONE(pm8001_dev) || !port->port_attached)) {
ts->resp = SAS_TASK_UNDELIVERED;
ts->stat = SAS_PHY_DOWN;
if (sas_protocol_ata(task_proto)) {
--
2.50.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-17 16:56 [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID" Niklas Cassel
@ 2025-07-17 21:20 ` Igor Pylypiv
2025-07-17 21:29 ` Niklas Cassel
2025-07-18 4:35 ` Damien Le Moal
2025-07-18 4:36 ` Damien Le Moal
1 sibling, 2 replies; 12+ messages in thread
From: Igor Pylypiv @ 2025-07-17 21:20 UTC (permalink / raw)
To: Niklas Cassel
Cc: Jack Wang, James E.J. Bottomley, Martin K. Petersen,
Terrence Adams, Damien Le Moal, linux-scsi
On Thu, Jul 17, 2025 at 06:56:07PM +0200, Niklas Cassel wrote:
> This reverts commit 0f630c58e31afb3dc2373bc1126b555f4b480bb2.
>
> Commit 0f630c58e31a ("scsi: pm80xx: Do not use libsas port ID") causes
> drives behind a SAS enclosure (which is connected to one of the ports
> on the HBA) to no longer be detected.
>
> Connecting the drives directly to the HBA still works. Thus, the commit
> only broke the detection of drives behind a SAS enclosure.
>
> Reverting the commit makes the drives behind the SAS enclosure to be
> detected once again.
>
> The commit log of the offending commit is quite vague, but mentions that:
> "Remove sas_find_local_port_id(). We can use pm8001_ha->phy[phy_id].port
> to get the port ID."
>
> This assumption appears false, thus revert the offending commit.
Thanks for bisecting and reverting the commit, Niklas!
Let me review the changes and send a proper fix that takes into account
drives behind a SAS enclosure. I would appretiate if you could test that
new fix since I don't have a setup with a SAS enclosure.
Thank you,
Igor
>
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
> ---
> dmesg after the offending commit, in case anyone is interested:
> pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 2
> pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x2
> pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 3
> pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x3
> pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 4
> pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x4
> pm80xx0:: mpi_hw_event 3417: HW_EVENT_SAS_PHY_UP phyid:0x4 port_id:0x0
> sas: phy-11:4 added to port-11:0, phy_mask:0x10 (50015b21409aefbf)
> sas: DOING DISCOVERY on port 0, pid:215
> pm80xx0:: pm80xx_chip_reg_dev_req 4742: register device req phy_id 0x0 port_id 0x0
> pm80xx0:: pm8001_mpi_reg_resp 3309: register device status 0 phy_id 0x0 device_id 16385
> sas: executing SMP task failed:-19
> sas: RG to ex 50015b21409aefbf failed:0xffffffed
> pm80xx0:: pm8001_chip_dereg_dev_req 4226: unregister device device_id 16385
> sas: DONE DISCOVERY on port 0, pid:215, result:-19
> pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 5
> pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x5
> pm80xx0:: mpi_hw_event 3417: HW_EVENT_SAS_PHY_UP phyid:0x5 port_id:0x0
> sas: phy5 matched wide port0
> sas: phy-11:5 added to port-11:0, phy_mask:0x30 (50015b21409aefbf)
> sas: DOING DISCOVERY on port 0, pid:277
> pm80xx0:: pm80xx_chip_reg_dev_req 4742: register device req phy_id 0x0 port_id 0x0
> pm80xx0:: pm8001_mpi_reg_resp 3309: register device status 0 phy_id 0x0 device_id 16386
> sas: executing SMP task failed:-19
> sas: RG to ex 50015b21409aefbf failed:0xffffffed
> pm80xx0:: pm8001_chip_dereg_dev_req 4226: unregister device device_id 16386
> sas: DONE DISCOVERY on port 0, pid:277, result:-19
> pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 6
> pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x6
> pm80xx0:: mpi_hw_event 3417: HW_EVENT_SAS_PHY_UP phyid:0x6 port_id:0x0
> sas: phy6 matched wide port0
> sas: phy-11:6 added to port-11:0, phy_mask:0x70 (50015b21409aefbf)
> sas: DOING DISCOVERY on port 0, pid:277
> pm80xx0:: pm80xx_chip_reg_dev_req 4742: register device req phy_id 0x0 port_id 0x0
> pm80xx0:: pm8001_mpi_reg_resp 3309: register device status 0 phy_id 0x0 device_id 16387
> sas: executing SMP task failed:-19
> sas: RG to ex 50015b21409aefbf failed:0xffffffed
> pm80xx0:: pm8001_chip_dereg_dev_req 4226: unregister device device_id 16387
> sas: DONE DISCOVERY on port 0, pid:277, result:-19
> pm80xx0:: pm80xx_chip_phy_start_req 4641: PHY START REQ for phy_id 7
> pm80xx0:: mpi_phy_start_resp 3340: phy start resp status:0x0, phyid:0x7
> pm80xx0:: mpi_hw_event 3417: HW_EVENT_SAS_PHY_UP phyid:0x7 port_id:0x0
> sas: phy7 matched wide port0
> sas: phy-11:7 added to port-11:0, phy_mask:0xf0 (50015b21409aefbf)
> sas: DOING DISCOVERY on port 0, pid:215
> pm80xx0:: pm80xx_chip_reg_dev_req 4742: register device req phy_id 0x0 port_id 0x0
> pm80xx0:: pm8001_mpi_reg_resp 3309: register device status 0 phy_id 0x0 device_id 16388
> sas: executing SMP task failed:-19
> sas: RG to ex 50015b21409aefbf failed:0xffffffed
> pm80xx0:: pm8001_chip_dereg_dev_req 4226: unregister device device_id 16388
> sas: DONE DISCOVERY on port 0, pid:215, result:-19
>
> drivers/scsi/pm8001/pm8001_sas.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> index f7067878b34f..8cfdfb77d9b0 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -431,6 +431,23 @@ static int pm8001_task_prep_ssp(struct pm8001_hba_info *pm8001_ha,
> return PM8001_CHIP_DISP->ssp_io_req(pm8001_ha, ccb);
> }
>
> + /* Find the local port id that's attached to this device */
> +static int sas_find_local_port_id(struct domain_device *dev)
> +{
> + struct domain_device *pdev = dev->parent;
> +
> + /* Directly attached device */
> + if (!pdev)
> + return dev->port->id;
> + while (pdev) {
> + struct domain_device *pdev_p = pdev->parent;
> + if (!pdev_p)
> + return pdev->port->id;
> + pdev = pdev->parent;
> + }
> + return 0;
> +}
> +
> #define DEV_IS_GONE(pm8001_dev) \
> ((!pm8001_dev || (pm8001_dev->dev_type == SAS_PHY_UNUSED)))
>
> @@ -503,10 +520,10 @@ int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
> spin_lock_irqsave(&pm8001_ha->lock, flags);
>
> pm8001_dev = dev->lldd_dev;
> - port = pm8001_ha->phy[pm8001_dev->attached_phy].port;
> + port = &pm8001_ha->port[sas_find_local_port_id(dev)];
>
> if (!internal_abort &&
> - (DEV_IS_GONE(pm8001_dev) || !port || !port->port_attached)) {
> + (DEV_IS_GONE(pm8001_dev) || !port->port_attached)) {
> ts->resp = SAS_TASK_UNDELIVERED;
> ts->stat = SAS_PHY_DOWN;
> if (sas_protocol_ata(task_proto)) {
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-17 21:20 ` Igor Pylypiv
@ 2025-07-17 21:29 ` Niklas Cassel
2025-07-18 4:35 ` Damien Le Moal
1 sibling, 0 replies; 12+ messages in thread
From: Niklas Cassel @ 2025-07-17 21:29 UTC (permalink / raw)
To: Igor Pylypiv
Cc: Jack Wang, James E.J. Bottomley, Martin K. Petersen,
Terrence Adams, Damien Le Moal, linux-scsi
Hello Igor,
On 17 July 2025 23:20:53 CEST, Igor Pylypiv <ipylypiv@google.com> wrote:
>On Thu, Jul 17, 2025 at 06:56:07PM +0200, Niklas Cassel wrote:
>> This reverts commit 0f630c58e31afb3dc2373bc1126b555f4b480bb2.
>>
>> Commit 0f630c58e31a ("scsi: pm80xx: Do not use libsas port ID") causes
>> drives behind a SAS enclosure (which is connected to one of the ports
>> on the HBA) to no longer be detected.
>>
>> Connecting the drives directly to the HBA still works. Thus, the commit
>> only broke the detection of drives behind a SAS enclosure.
>>
>> Reverting the commit makes the drives behind the SAS enclosure to be
>> detected once again.
>>
>> The commit log of the offending commit is quite vague, but mentions that:
>> "Remove sas_find_local_port_id(). We can use pm8001_ha->phy[phy_id].port
>> to get the port ID."
>>
>> This assumption appears false, thus revert the offending commit.
>
>Thanks for bisecting and reverting the commit, Niklas!
>
>Let me review the changes and send a proper fix that takes into account
>drives behind a SAS enclosure. I would appretiate if you could test that
>new fix since I don't have a setup with a SAS enclosure.
Of course, I would gladly help with testing :)
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-17 21:20 ` Igor Pylypiv
2025-07-17 21:29 ` Niklas Cassel
@ 2025-07-18 4:35 ` Damien Le Moal
2025-07-18 22:30 ` Igor Pylypiv
1 sibling, 1 reply; 12+ messages in thread
From: Damien Le Moal @ 2025-07-18 4:35 UTC (permalink / raw)
To: Igor Pylypiv, Niklas Cassel
Cc: Jack Wang, James E.J. Bottomley, Martin K. Petersen,
Terrence Adams, linux-scsi
On 2025/07/18 6:20, Igor Pylypiv wrote:
> On Thu, Jul 17, 2025 at 06:56:07PM +0200, Niklas Cassel wrote:
>> This reverts commit 0f630c58e31afb3dc2373bc1126b555f4b480bb2.
>>
>> Commit 0f630c58e31a ("scsi: pm80xx: Do not use libsas port ID") causes
>> drives behind a SAS enclosure (which is connected to one of the ports
>> on the HBA) to no longer be detected.
>>
>> Connecting the drives directly to the HBA still works. Thus, the commit
>> only broke the detection of drives behind a SAS enclosure.
>>
>> Reverting the commit makes the drives behind the SAS enclosure to be
>> detected once again.
>>
>> The commit log of the offending commit is quite vague, but mentions that:
>> "Remove sas_find_local_port_id(). We can use pm8001_ha->phy[phy_id].port
>> to get the port ID."
>>
>> This assumption appears false, thus revert the offending commit.
>
> Thanks for bisecting and reverting the commit, Niklas!
>
> Let me review the changes and send a proper fix that takes into account
> drives behind a SAS enclosure. I would appretiate if you could test that
> new fix since I don't have a setup with a SAS enclosure.
s/enclosure/expander
(the important point here is if there is an expander between the HBA and the
devices, not how the devices are installed. E.g. a simple enclosure may not have
any expander and act similar to a fan-out cable connection)
I think the issue is that if you do not have an expander and use fan-out cables
to connect drives directly to the HBA, you essentially get HBA SAS port ==
device port and it works (My setup is like this with an -8e model, so 8 ports, 0
to 7). That is the case for "if (!pdev)" in sas_find_local_port.
But if there is an expander, you can have multiple devices per HBA port, so you
need to search backward using the parent device until you hit the HBA device
itself, which gives you the port to use to communicate with the device.
So not sure if we can simplify/remove the loop in sas_find_local_port(). Maybe
if we add "local_port" field to struct domain_device ? But then any topology
change event would potentially need to update this.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-17 16:56 [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID" Niklas Cassel
2025-07-17 21:20 ` Igor Pylypiv
@ 2025-07-18 4:36 ` Damien Le Moal
1 sibling, 0 replies; 12+ messages in thread
From: Damien Le Moal @ 2025-07-18 4:36 UTC (permalink / raw)
To: Niklas Cassel, Jack Wang, James E.J. Bottomley,
Martin K. Petersen
Cc: Igor Pylypiv, Terrence Adams, linux-scsi
On 2025/07/18 1:56, Niklas Cassel wrote:
> This reverts commit 0f630c58e31afb3dc2373bc1126b555f4b480bb2.
>
> Commit 0f630c58e31a ("scsi: pm80xx: Do not use libsas port ID") causes
> drives behind a SAS enclosure (which is connected to one of the ports
> on the HBA) to no longer be detected.
>
> Connecting the drives directly to the HBA still works. Thus, the commit
> only broke the detection of drives behind a SAS enclosure.
>
> Reverting the commit makes the drives behind the SAS enclosure to be
> detected once again.
>
> The commit log of the offending commit is quite vague, but mentions that:
> "Remove sas_find_local_port_id(). We can use pm8001_ha->phy[phy_id].port
> to get the port ID."
>
> This assumption appears false, thus revert the offending commit.
>
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-18 4:35 ` Damien Le Moal
@ 2025-07-18 22:30 ` Igor Pylypiv
2025-07-22 1:28 ` Damien Le Moal
0 siblings, 1 reply; 12+ messages in thread
From: Igor Pylypiv @ 2025-07-18 22:30 UTC (permalink / raw)
To: Damien Le Moal
Cc: Niklas Cassel, Jack Wang, James E.J. Bottomley,
Martin K. Petersen, Terrence Adams, linux-scsi
On Fri, Jul 18, 2025 at 01:35:13PM +0900, Damien Le Moal wrote:
> On 2025/07/18 6:20, Igor Pylypiv wrote:
> > On Thu, Jul 17, 2025 at 06:56:07PM +0200, Niklas Cassel wrote:
> >> This reverts commit 0f630c58e31afb3dc2373bc1126b555f4b480bb2.
> >>
> >> Commit 0f630c58e31a ("scsi: pm80xx: Do not use libsas port ID") causes
> >> drives behind a SAS enclosure (which is connected to one of the ports
> >> on the HBA) to no longer be detected.
> >>
> >> Connecting the drives directly to the HBA still works. Thus, the commit
> >> only broke the detection of drives behind a SAS enclosure.
> >>
> >> Reverting the commit makes the drives behind the SAS enclosure to be
> >> detected once again.
> >>
> >> The commit log of the offending commit is quite vague, but mentions that:
> >> "Remove sas_find_local_port_id(). We can use pm8001_ha->phy[phy_id].port
> >> to get the port ID."
> >>
> >> This assumption appears false, thus revert the offending commit.
> >
> > Thanks for bisecting and reverting the commit, Niklas!
> >
> > Let me review the changes and send a proper fix that takes into account
> > drives behind a SAS enclosure. I would appretiate if you could test that
> > new fix since I don't have a setup with a SAS enclosure.
>
> s/enclosure/expander
> (the important point here is if there is an expander between the HBA and the
> devices, not how the devices are installed. E.g. a simple enclosure may not have
> any expander and act similar to a fan-out cable connection)
>
> I think the issue is that if you do not have an expander and use fan-out cables
> to connect drives directly to the HBA, you essentially get HBA SAS port ==
> device port and it works (My setup is like this with an -8e model, so 8 ports, 0
> to 7). That is the case for "if (!pdev)" in sas_find_local_port.
>
> But if there is an expander, you can have multiple devices per HBA port, so you
> need to search backward using the parent device until you hit the HBA device
> itself, which gives you the port to use to communicate with the device.
>
> So not sure if we can simplify/remove the loop in sas_find_local_port(). Maybe
> if we add "local_port" field to struct domain_device ? But then any topology
> change event would potentially need to update this.
>
Looks like there is no need for sas_find_local_port() because child devices
have a reference to the same port?
struct domain_device {
...
struct asd_sas_port *port; /* shortcut to root of the tree */
}
https://github.com/torvalds/linux/blob/c7de79e662b8681f54196c107281f1e63c26a3db/include/scsi/libsas.h#L168
During discovery, child devices inherit parent's port:
https://github.com/torvalds/linux/blob/c7de79e662b8681f54196c107281f1e63c26a3db/drivers/scsi/libsas/sas_expander.c#L826
and also
https://github.com/torvalds/linux/blob/c7de79e662b8681f54196c107281f1e63c26a3db/drivers/scsi/libsas/sas_expander.c#L936
Hey Niklas,
Could you try the following fix with your expander setup, please?
The fix assumes the problematic patch is not yet revered.
$ git diff
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index f7067878b34f..cd9513c23c71 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -503,7 +503,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
spin_lock_irqsave(&pm8001_ha->lock, flags);
pm8001_dev = dev->lldd_dev;
- port = pm8001_ha->phy[pm8001_dev->attached_phy].port;
+ port = dev->port->lldd_port;
if (!internal_abort &&
(DEV_IS_GONE(pm8001_dev) || !port || !port->port_attached)) {
Thank you!
Igor
>
> --
> Damien Le Moal
> Western Digital Research
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-18 22:30 ` Igor Pylypiv
@ 2025-07-22 1:28 ` Damien Le Moal
2025-07-22 4:25 ` Damien Le Moal
0 siblings, 1 reply; 12+ messages in thread
From: Damien Le Moal @ 2025-07-22 1:28 UTC (permalink / raw)
To: Igor Pylypiv
Cc: Niklas Cassel, Jack Wang, James E.J. Bottomley,
Martin K. Petersen, Terrence Adams, linux-scsi
On 7/19/25 7:30 AM, Igor Pylypiv wrote:
> Hey Niklas,
>
> Could you try the following fix with your expander setup, please?
> The fix assumes the problematic patch is not yet revered.
>
> $ git diff
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> index f7067878b34f..cd9513c23c71 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -503,7 +503,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
> spin_lock_irqsave(&pm8001_ha->lock, flags);
>
> pm8001_dev = dev->lldd_dev;
> - port = pm8001_ha->phy[pm8001_dev->attached_phy].port;
> + port = dev->port->lldd_port;
>
> if (!internal_abort &&
> (DEV_IS_GONE(pm8001_dev) || !port || !port->port_attached)) {
Igor,
I tested this, or rather, a variation of it that clean things up at the same time:
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index f7067878b34f..753c09363cbb 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -477,7 +477,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t
gfp_flags)
struct pm8001_device *pm8001_dev = dev->lldd_dev;
bool internal_abort = sas_is_internal_abort(task);
struct pm8001_hba_info *pm8001_ha;
- struct pm8001_port *port = NULL;
+ struct pm8001_port *port;
struct pm8001_ccb_info *ccb;
unsigned long flags;
u32 n_elem = 0;
@@ -502,8 +502,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t
gfp_flags)
spin_lock_irqsave(&pm8001_ha->lock, flags);
- pm8001_dev = dev->lldd_dev;
- port = pm8001_ha->phy[pm8001_dev->attached_phy].port;
+ port = dev->port->lldd_port;
if (!internal_abort &&
(DEV_IS_GONE(pm8001_dev) || !port || !port->port_attached)) {
And it works, I can see the drives in the enclosure behind the expander.
Care to send a proper path ?
I think this needs more testing though, especially special cases like yanking
the SAS cable and doing device hotplug/unplug. Will do that later today.
Thanks !
--
Damien Le Moal
Western Digital Research
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-22 1:28 ` Damien Le Moal
@ 2025-07-22 4:25 ` Damien Le Moal
2025-07-22 21:36 ` Igor Pylypiv
0 siblings, 1 reply; 12+ messages in thread
From: Damien Le Moal @ 2025-07-22 4:25 UTC (permalink / raw)
To: Igor Pylypiv
Cc: Niklas Cassel, Jack Wang, James E.J. Bottomley,
Martin K. Petersen, Terrence Adams, linux-scsi
On 7/22/25 10:28 AM, Damien Le Moal wrote:
> On 7/19/25 7:30 AM, Igor Pylypiv wrote:
>> Hey Niklas,
>>
>> Could you try the following fix with your expander setup, please?
>> The fix assumes the problematic patch is not yet revered.
>>
>> $ git diff
>> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
>> index f7067878b34f..cd9513c23c71 100644
>> --- a/drivers/scsi/pm8001/pm8001_sas.c
>> +++ b/drivers/scsi/pm8001/pm8001_sas.c
>> @@ -503,7 +503,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
>> spin_lock_irqsave(&pm8001_ha->lock, flags);
>>
>> pm8001_dev = dev->lldd_dev;
>> - port = pm8001_ha->phy[pm8001_dev->attached_phy].port;
>> + port = dev->port->lldd_port;
>>
>> if (!internal_abort &&
>> (DEV_IS_GONE(pm8001_dev) || !port || !port->port_attached)) {
>
> Igor,
>
> I tested this, or rather, a variation of it that clean things up at the same time:
>
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> index f7067878b34f..753c09363cbb 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -477,7 +477,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t
> gfp_flags)
> struct pm8001_device *pm8001_dev = dev->lldd_dev;
> bool internal_abort = sas_is_internal_abort(task);
> struct pm8001_hba_info *pm8001_ha;
> - struct pm8001_port *port = NULL;
> + struct pm8001_port *port;
> struct pm8001_ccb_info *ccb;
> unsigned long flags;
> u32 n_elem = 0;
> @@ -502,8 +502,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t
> gfp_flags)
>
> spin_lock_irqsave(&pm8001_ha->lock, flags);
>
> - pm8001_dev = dev->lldd_dev;
> - port = pm8001_ha->phy[pm8001_dev->attached_phy].port;
> + port = dev->port->lldd_port;
>
> if (!internal_abort &&
> (DEV_IS_GONE(pm8001_dev) || !port || !port->port_attached)) {
>
>
> And it works, I can see the drives in the enclosure behind the expander.
> Care to send a proper path ?
>
> I think this needs more testing though, especially special cases like yanking
> the SAS cable and doing device hotplug/unplug. Will do that later today.
So I did that. And things are not pretty... Even a simple "rmmod pm80xx"
crashes the kernel on a bad pointer dereference (invalid port address). Same if
I hot-unplug drives from the enclosure. But that happens even with only Niklas
revert patch applied. So I think that is unrelated to this change.
That said, I will dig further to understand how the port pointers become
invalid, and make sure this change is OK. Note that there are no issues that I
can see when there is no expander (drives directly attached to the HBA).
Cheers.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-22 4:25 ` Damien Le Moal
@ 2025-07-22 21:36 ` Igor Pylypiv
2025-07-23 1:38 ` Damien Le Moal
0 siblings, 1 reply; 12+ messages in thread
From: Igor Pylypiv @ 2025-07-22 21:36 UTC (permalink / raw)
To: Damien Le Moal
Cc: Niklas Cassel, Jack Wang, James E.J. Bottomley,
Martin K. Petersen, Terrence Adams, linux-scsi
On Tue, Jul 22, 2025 at 01:25:47PM +0900, Damien Le Moal wrote:
> On 7/22/25 10:28 AM, Damien Le Moal wrote:
> > On 7/19/25 7:30 AM, Igor Pylypiv wrote:
> >> Hey Niklas,
> >>
> >> Could you try the following fix with your expander setup, please?
> >> The fix assumes the problematic patch is not yet revered.
> >>
> >> $ git diff
> >> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> >> index f7067878b34f..cd9513c23c71 100644
> >> --- a/drivers/scsi/pm8001/pm8001_sas.c
> >> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> >> @@ -503,7 +503,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags)
> >> spin_lock_irqsave(&pm8001_ha->lock, flags);
> >>
> >> pm8001_dev = dev->lldd_dev;
> >> - port = pm8001_ha->phy[pm8001_dev->attached_phy].port;
> >> + port = dev->port->lldd_port;
> >>
> >> if (!internal_abort &&
> >> (DEV_IS_GONE(pm8001_dev) || !port || !port->port_attached)) {
> >
> > Igor,
> >
> > I tested this, or rather, a variation of it that clean things up at the same time:
> >
> > diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> > index f7067878b34f..753c09363cbb 100644
> > --- a/drivers/scsi/pm8001/pm8001_sas.c
> > +++ b/drivers/scsi/pm8001/pm8001_sas.c
> > @@ -477,7 +477,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t
> > gfp_flags)
> > struct pm8001_device *pm8001_dev = dev->lldd_dev;
> > bool internal_abort = sas_is_internal_abort(task);
> > struct pm8001_hba_info *pm8001_ha;
> > - struct pm8001_port *port = NULL;
> > + struct pm8001_port *port;
> > struct pm8001_ccb_info *ccb;
> > unsigned long flags;
> > u32 n_elem = 0;
> > @@ -502,8 +502,7 @@ int pm8001_queue_command(struct sas_task *task, gfp_t
> > gfp_flags)
> >
> > spin_lock_irqsave(&pm8001_ha->lock, flags);
> >
> > - pm8001_dev = dev->lldd_dev;
> > - port = pm8001_ha->phy[pm8001_dev->attached_phy].port;
> > + port = dev->port->lldd_port;
> >
> > if (!internal_abort &&
> > (DEV_IS_GONE(pm8001_dev) || !port || !port->port_attached)) {
> >
> >
> > And it works, I can see the drives in the enclosure behind the expander.
> > Care to send a proper path ?
> >
> > I think this needs more testing though, especially special cases like yanking
> > the SAS cable and doing device hotplug/unplug. Will do that later today.
>
> So I did that. And things are not pretty... Even a simple "rmmod pm80xx"
> crashes the kernel on a bad pointer dereference (invalid port address). Same if
> I hot-unplug drives from the enclosure. But that happens even with only Niklas
> revert patch applied. So I think that is unrelated to this change.
>
> That said, I will dig further to understand how the port pointers become
> invalid, and make sure this change is OK. Note that there are no issues that I
> can see when there is no expander (drives directly attached to the HBA).
Thank you for testing, Damien!
Just guessing, would defining the lldd_port_deformed() callback help?
The callback can set lldd_port to NULL if the problem is due to a dangling
lldd_port pointer.
Thanks,
Igor
>
> Cheers.
>
>
> --
> Damien Le Moal
> Western Digital Research
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-22 21:36 ` Igor Pylypiv
@ 2025-07-23 1:38 ` Damien Le Moal
2025-08-02 18:31 ` Igor Pylypiv
0 siblings, 1 reply; 12+ messages in thread
From: Damien Le Moal @ 2025-07-23 1:38 UTC (permalink / raw)
To: Igor Pylypiv
Cc: Niklas Cassel, Jack Wang, James E.J. Bottomley,
Martin K. Petersen, Terrence Adams, linux-scsi
On 7/23/25 6:36 AM, Igor Pylypiv wrote:
>>> And it works, I can see the drives in the enclosure behind the expander.
>>> Care to send a proper path ?
>>>
>>> I think this needs more testing though, especially special cases like yanking
>>> the SAS cable and doing device hotplug/unplug. Will do that later today.
>>
>> So I did that. And things are not pretty... Even a simple "rmmod pm80xx"
>> crashes the kernel on a bad pointer dereference (invalid port address). Same if
>> I hot-unplug drives from the enclosure. But that happens even with only Niklas
>> revert patch applied. So I think that is unrelated to this change.
>>
>> That said, I will dig further to understand how the port pointers become
>> invalid, and make sure this change is OK. Note that there are no issues that I
>> can see when there is no expander (drives directly attached to the HBA).
>
> Thank you for testing, Damien!
>
> Just guessing, would defining the lldd_port_deformed() callback help?
> The callback can set lldd_port to NULL if the problem is due to a dangling
> lldd_port pointer.
Not sure if that is needed yet. The crash I am seeing is:
[56961.621080] BUG: unable to handle page fault for address: ff303500e1ee00ac
[56961.629527] #PF: supervisor read access in kernel mode
[56961.635315] #PF: error_code(0x0000) - not-present page
[56961.641102] PGD 95e001067 P4D 95e002067 PUD 0
[56961.646113] Oops: Oops: 0000 [#1] SMP
[56961.650244] CPU: 10 UID: 0 PID: 3373 Comm: rmmod Not tainted 6.16.0-rc7+
#380 PREEMPT(voluntary)
[56961.660238] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2
02/14/2022
[56961.668664] RIP: 0010:do_raw_spin_lock+0xa/0xb0
[56961.673776] Code: ff 48 c7 03 00 00 00 00 48 c7 43 10 ff ff ff ff 48 c7 43
08 ed 1e af de 48 83 c4 10 5b 5d c3 90 66 0f 1f 00 0f 1f 44 00 00 53 <8b> 47 04
48 89 fb 3d ad 4e ad de 75 41 48 8b 43 10 65 48 3b 05 4d
[56961.694912] RSP: 0018:ff31e51a4f09fa60 EFLAGS: 00010082
[56961.700799] RAX: 0000000000000000 RBX: 0000000000000046 RCX: 0000000000000000
[56961.708841] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff303500e1ee00a8
[56961.716880] RBP: ff303500e1ee00a8 R08: 0000000000000001 R09: 0000000000000000
[56961.724920] R10: 0000000000000000 R11: 0000000000000000 R12: ff303500e1ee0000
[56961.732958] R13: ff303504f4f64000 R14: ff303504e1ee0000 R15: ff303504d9f60000
[56961.740999] FS: 00007f126637a740(0000) GS:ff3035145d8ed000(0000)
knlGS:0000000000000000
[56961.750114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[56961.756590] CR2: ff303500e1ee00ac CR3: 00000001f2f38001 CR4: 0000000000773ef0
[56961.765229] PKRU: 55555554
[56961.768887] Call Trace:
[56961.772249] <TASK>
[56961.775212] _raw_spin_lock_irqsave+0x41/0x50
[56961.780723] ? sas_ata_end_eh+0x2f/0x60 [libsas]
[56961.786532] sas_ata_end_eh+0x2f/0x60 [libsas]
[56961.792133] sas_unregister_common_dev+0xc3/0x1a0 [libsas]
[56961.798897] sas_destruct_devices+0x9f/0xc0 [libsas]
[56961.805073] sas_deform_port.cold+0x83/0x288 [libsas]
[56961.811330] sas_unregister_ports+0x36/0x50 [libsas]
[56961.817484] sas_unregister_ha+0x56/0x90 [libsas]
[56961.823343] pm8001_pci_remove+0x28/0x160 [pm80xx]
[56961.829315] pci_device_remove+0x44/0xb0
[56961.834287] device_release_driver_internal+0x1a4/0x210
[56961.840718] driver_detach+0x4b/0x90
[56961.845297] bus_remove_driver+0x70/0xf0
[56961.850265] pci_unregister_driver+0x2f/0xb0
[56961.855625] pm8001_exit+0x10/0x28 [pm80xx]
[56961.860909] __x64_sys_delete_module+0x19e/0x2d0
[56961.866650] do_syscall_64+0x92/0x380
[56961.871310] ? __x64_sys_close+0x3d/0x80
[56961.876246] ? trace_hardirqs_on_prepare+0x77/0xa0
[56961.882156] ? do_syscall_64+0x154/0x380
[56961.887081] ? do_fault+0x3ef/0x720
[56961.891510] ? ___pte_offset_map+0x43/0x1c0
[56961.896711] ? ___pte_offset_map+0x24/0x1c0
[56961.901901] ? __handle_mm_fault+0x5f2/0x1140
[56961.907278] ? ksys_read+0x79/0xf0
[56961.911570] ? lock_acquire+0x280/0x2e0
[56961.916339] ? lock_acquire+0x280/0x2e0
[56961.921103] ? lock_release+0x222/0x2d0
[56961.925865] ? rcu_read_unlock+0x1c/0x60
[56961.930720] ? lock_release+0x222/0x2d0
[56961.935476] ? do_user_addr_fault+0x368/0x6a0
[56961.940811] ? trace_hardirqs_off+0x40/0xb0
[56961.945936] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[56961.952031] RIP: 0033:0x7f1265d02a2b
[56961.956444] Code: 73 01 c3 48 8b 0d dd 33 0f 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d ad 33 0f 00 f7 d8 64 89 01 48
[56961.978361] RSP: 002b:00007ffe899f06c8 EFLAGS: 00000202 ORIG_RAX:
00000000000000b0
[56961.987286] RAX: ffffffffffffffda RBX: 0000559d15a32750 RCX: 00007f1265d02a2b
[56961.995727] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559d15a327b8
[56962.004170] RBP: 00007ffe899f06f0 R08: 0000000000000000 R09: 0000000000000000
[56962.012612] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[56962.021058] R13: 00007ffe899f1701 R14: 00007ffe899f0940 R15: 0000000000000000
So it is here:
void sas_ata_end_eh(struct ata_port *ap)
{
struct domain_device *dev = ap->private_data;
struct sas_ha_struct *ha = dev->port->ha;
unsigned long flags;
spin_lock_irqsave(&ha->lock, flags);
...
Meaning that the ha pointer is broken...
Not sure why yet, but I get the exact same place for the crash if I hot-unplug
a drive from the enclosure. So something is wrong with removing devices when
there is an expander. Still digging into it.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-07-23 1:38 ` Damien Le Moal
@ 2025-08-02 18:31 ` Igor Pylypiv
2025-08-04 0:03 ` Damien Le Moal
0 siblings, 1 reply; 12+ messages in thread
From: Igor Pylypiv @ 2025-08-02 18:31 UTC (permalink / raw)
To: Damien Le Moal
Cc: Niklas Cassel, Jack Wang, James E.J. Bottomley,
Martin K. Petersen, Terrence Adams, linux-scsi
On Wed, Jul 23, 2025 at 10:38:56AM +0900, Damien Le Moal wrote:
> On 7/23/25 6:36 AM, Igor Pylypiv wrote:
> >>> And it works, I can see the drives in the enclosure behind the expander.
> >>> Care to send a proper path ?
> >>>
> >>> I think this needs more testing though, especially special cases like yanking
> >>> the SAS cable and doing device hotplug/unplug. Will do that later today.
> >>
> >> So I did that. And things are not pretty... Even a simple "rmmod pm80xx"
> >> crashes the kernel on a bad pointer dereference (invalid port address). Same if
> >> I hot-unplug drives from the enclosure. But that happens even with only Niklas
> >> revert patch applied. So I think that is unrelated to this change.
> >>
> >> That said, I will dig further to understand how the port pointers become
> >> invalid, and make sure this change is OK. Note that there are no issues that I
> >> can see when there is no expander (drives directly attached to the HBA).
> >
> > Thank you for testing, Damien!
> >
> > Just guessing, would defining the lldd_port_deformed() callback help?
> > The callback can set lldd_port to NULL if the problem is due to a dangling
> > lldd_port pointer.
>
> Not sure if that is needed yet. The crash I am seeing is:
>
> [56961.621080] BUG: unable to handle page fault for address: ff303500e1ee00ac
> [56961.629527] #PF: supervisor read access in kernel mode
> [56961.635315] #PF: error_code(0x0000) - not-present page
> [56961.641102] PGD 95e001067 P4D 95e002067 PUD 0
> [56961.646113] Oops: Oops: 0000 [#1] SMP
> [56961.650244] CPU: 10 UID: 0 PID: 3373 Comm: rmmod Not tainted 6.16.0-rc7+
> #380 PREEMPT(voluntary)
> [56961.660238] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2
> 02/14/2022
> [56961.668664] RIP: 0010:do_raw_spin_lock+0xa/0xb0
> [56961.673776] Code: ff 48 c7 03 00 00 00 00 48 c7 43 10 ff ff ff ff 48 c7 43
> 08 ed 1e af de 48 83 c4 10 5b 5d c3 90 66 0f 1f 00 0f 1f 44 00 00 53 <8b> 47 04
> 48 89 fb 3d ad 4e ad de 75 41 48 8b 43 10 65 48 3b 05 4d
> [56961.694912] RSP: 0018:ff31e51a4f09fa60 EFLAGS: 00010082
> [56961.700799] RAX: 0000000000000000 RBX: 0000000000000046 RCX: 0000000000000000
> [56961.708841] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff303500e1ee00a8
> [56961.716880] RBP: ff303500e1ee00a8 R08: 0000000000000001 R09: 0000000000000000
> [56961.724920] R10: 0000000000000000 R11: 0000000000000000 R12: ff303500e1ee0000
> [56961.732958] R13: ff303504f4f64000 R14: ff303504e1ee0000 R15: ff303504d9f60000
> [56961.740999] FS: 00007f126637a740(0000) GS:ff3035145d8ed000(0000)
> knlGS:0000000000000000
> [56961.750114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [56961.756590] CR2: ff303500e1ee00ac CR3: 00000001f2f38001 CR4: 0000000000773ef0
> [56961.765229] PKRU: 55555554
> [56961.768887] Call Trace:
> [56961.772249] <TASK>
> [56961.775212] _raw_spin_lock_irqsave+0x41/0x50
> [56961.780723] ? sas_ata_end_eh+0x2f/0x60 [libsas]
> [56961.786532] sas_ata_end_eh+0x2f/0x60 [libsas]
> [56961.792133] sas_unregister_common_dev+0xc3/0x1a0 [libsas]
> [56961.798897] sas_destruct_devices+0x9f/0xc0 [libsas]
> [56961.805073] sas_deform_port.cold+0x83/0x288 [libsas]
> [56961.811330] sas_unregister_ports+0x36/0x50 [libsas]
> [56961.817484] sas_unregister_ha+0x56/0x90 [libsas]
> [56961.823343] pm8001_pci_remove+0x28/0x160 [pm80xx]
> [56961.829315] pci_device_remove+0x44/0xb0
> [56961.834287] device_release_driver_internal+0x1a4/0x210
> [56961.840718] driver_detach+0x4b/0x90
> [56961.845297] bus_remove_driver+0x70/0xf0
> [56961.850265] pci_unregister_driver+0x2f/0xb0
> [56961.855625] pm8001_exit+0x10/0x28 [pm80xx]
> [56961.860909] __x64_sys_delete_module+0x19e/0x2d0
> [56961.866650] do_syscall_64+0x92/0x380
> [56961.871310] ? __x64_sys_close+0x3d/0x80
> [56961.876246] ? trace_hardirqs_on_prepare+0x77/0xa0
> [56961.882156] ? do_syscall_64+0x154/0x380
> [56961.887081] ? do_fault+0x3ef/0x720
> [56961.891510] ? ___pte_offset_map+0x43/0x1c0
> [56961.896711] ? ___pte_offset_map+0x24/0x1c0
> [56961.901901] ? __handle_mm_fault+0x5f2/0x1140
> [56961.907278] ? ksys_read+0x79/0xf0
> [56961.911570] ? lock_acquire+0x280/0x2e0
> [56961.916339] ? lock_acquire+0x280/0x2e0
> [56961.921103] ? lock_release+0x222/0x2d0
> [56961.925865] ? rcu_read_unlock+0x1c/0x60
> [56961.930720] ? lock_release+0x222/0x2d0
> [56961.935476] ? do_user_addr_fault+0x368/0x6a0
> [56961.940811] ? trace_hardirqs_off+0x40/0xb0
> [56961.945936] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [56961.952031] RIP: 0033:0x7f1265d02a2b
> [56961.956444] Code: 73 01 c3 48 8b 0d dd 33 0f 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01
> f0 ff ff 73 01 c3 48 8b 0d ad 33 0f 00 f7 d8 64 89 01 48
> [56961.978361] RSP: 002b:00007ffe899f06c8 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000b0
> [56961.987286] RAX: ffffffffffffffda RBX: 0000559d15a32750 RCX: 00007f1265d02a2b
> [56961.995727] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559d15a327b8
> [56962.004170] RBP: 00007ffe899f06f0 R08: 0000000000000000 R09: 0000000000000000
> [56962.012612] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> [56962.021058] R13: 00007ffe899f1701 R14: 00007ffe899f0940 R15: 0000000000000000
>
> So it is here:
>
> void sas_ata_end_eh(struct ata_port *ap)
> {
> struct domain_device *dev = ap->private_data;
> struct sas_ha_struct *ha = dev->port->ha;
> unsigned long flags;
>
> spin_lock_irqsave(&ha->lock, flags);
> ...
>
> Meaning that the ha pointer is broken...
> Not sure why yet, but I get the exact same place for the crash if I hot-unplug
> a drive from the enclosure. So something is wrong with removing devices when
> there is an expander. Still digging into it.
>
Hi Damien,
Do you happen to have any updates regarding this crash?
Thanks,
Igor
> --
> Damien Le Moal
> Western Digital Research
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
2025-08-02 18:31 ` Igor Pylypiv
@ 2025-08-04 0:03 ` Damien Le Moal
0 siblings, 0 replies; 12+ messages in thread
From: Damien Le Moal @ 2025-08-04 0:03 UTC (permalink / raw)
To: Igor Pylypiv
Cc: Niklas Cassel, Jack Wang, James E.J. Bottomley,
Martin K. Petersen, Terrence Adams, linux-scsi
On 8/3/25 3:31 AM, Igor Pylypiv wrote:
> Hi Damien,
>
> Do you happen to have any updates regarding this crash?
No updates yet. Been busy with other things (merge window, regressions...)
Will try to get this fixed ASAP.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-04 0:05 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17 16:56 [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID" Niklas Cassel
2025-07-17 21:20 ` Igor Pylypiv
2025-07-17 21:29 ` Niklas Cassel
2025-07-18 4:35 ` Damien Le Moal
2025-07-18 22:30 ` Igor Pylypiv
2025-07-22 1:28 ` Damien Le Moal
2025-07-22 4:25 ` Damien Le Moal
2025-07-22 21:36 ` Igor Pylypiv
2025-07-23 1:38 ` Damien Le Moal
2025-08-02 18:31 ` Igor Pylypiv
2025-08-04 0:03 ` Damien Le Moal
2025-07-18 4:36 ` Damien Le Moal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox