From: John Garry <john.g.garry@oracle.com>
To: yangxingui <yangxingui@huawei.com>,
liyihang9@huawei.com, yanaijie@huawei.com
Cc: jejb@linux.ibm.com, martin.petersen@oracle.com,
linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
linuxarm@huawei.com, prime.zeng@huawei.com,
liuyonglong@huawei.com, kangfenglong@huawei.com,
liyangyang20@huawei.com, f.fangjian@huawei.com,
xiabing14@h-partners.com
Subject: Re: [PATCH v3 1/3] scsi: hisi_sas: Enable force phy when SATA disk directly connected
Date: Mon, 24 Feb 2025 17:34:07 +0000 [thread overview]
Message-ID: <5d34595f-ff57-4679-b263-fa3fea006ce3@oracle.com> (raw)
In-Reply-To: <cc9ba6f8-1efb-4910-8952-9ca07c707658@huawei.com>
On 24/02/2025 13:12, yangxingui wrote:
> Hi, John
>
> On 2025/2/24 20:21, John Garry wrote:
>> On 24/02/2025 09:36, yangxingui wrote:
>>>>
>>>>
>>>> So do you mean that all IO to this disk will error? If yes, then
>>>> this is good.
>>> Yes, IO error or IO result does not meet expectations. As shown in
>>> the log below, due to an abnormal port ID, the SNs of the two disks
>>> read are the same.
>>
>> Do you mean that this is mainline kernel behaviour, below:
> Yes
>>
>>>
>>> [448000.504979] hisi_sas_v3_hw 0000:d4:02.0: phyup: phy1
>>> link_rate=10(sata)
>>> [448000.505070] sas: phy-10:1 added to port-10:1, phy_mask:0x2
>>> (5000000000000a01)
>>> [448000.505247] sas: DOING DISCOVERY on port 1, pid:2239187
>>> [448000.505255] hisi_sas_v3_hw 0000:d4:02.0: dev[2:5] found
>>> [448000.505274] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
>>> [448000.505295] sas: ata31: end_device-10:0: dev error handler
>>> [448000.505299] sas: ata32: end_device-10:1: dev error handler
>>> [448001.300517] hisi_sas_v3_hw 0000:d4:02.0: phydown: phy1
>>> phy_state=0x1 // phy1's hw port id released
>>> [448001.300522] hisi_sas_v3_hw 0000:d4:02.0: ignore flutter phy1 down
>>> [448001.436187] hisi_sas_v3_hw 0000:d4:02.0: phyup: phy2
>>> link_rate=10(sata) // phy2 occupies the hardware port ID of phy1
>>> [448001.608766] hisi_sas_v3_hw 0000:d4:02.0: phyup: phy1
>>> link_rate=10(sata) // phy1 was assigned a new hardware port ID
>>> [448001.775605] ata32.00: ATA-11: WUH721816ALE6L4, PCGAW660, max
>>> UDMA/133
>>> [448002.159364] sas: phy-10:2 added to port-10:2, phy_mask:0x4
>>> (5000000000000a02)
>>> [448002.159575] sas: DOING DISCOVERY on port 2, pid:2239187
>>> [448002.159581] hisi_sas_v3_hw 0000:d4:02.0: dev[3:5] found
>>> [448002.159602] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
>>> [448002.159623] sas: ata31: end_device-10:0: dev error handler
>>> [448002.159633] sas: ata32: end_device-10:1: dev error handler
>>> [448002.159636] sas: ata33: end_device-10:2: dev error handler
>>> [448002.393349] hisi_sas_v3_hw 0000:d4:02.0: phydown: phy2 phy_state=0x3
>>> [448002.393354] hisi_sas_v3_hw 0000:d4:02.0: ignore flutter phy2 down
>>> [448002.684937] hisi_sas_v3_hw 0000:d4:02.0: phyup: phy2
>>> link_rate=10(sata)
>>> [448002.851639] ata33.00: ATA-11: WUH721816ALE6L4, PCGAW660, max
>>> UDMA/133
>>> [448002.851644] ata33.00: 31251759104 sectors, multi 0: LBA48 NCQ
>>> (depth 32)
>>>
>>>>
>>>> But I still don't like the handling in this patch. If we get a phy
>>>> up, then the directly-attached disk ideally should be gone already,
>>>> so should not have to do this handling.
>>> There is no problem when the disk is removed. The current problem is
>>> that multiple phy up at the same time. When one of the phys up and
>>> enters error handler to execute hardreset, the phy will down and then
>>> up. other phy up will probably occupy the hw port id of the previous
>>> phy which do hardreset in EH.
>>
>> Could you do this work (itct update) in lldd_ata_check_ready CB?
>
> It's a good idea only for sata disks, but the current problem is not
> only the scenario of connecting the sata disk. This phenomenon
> occasionally occurs when the SAS disk is connected after the controller
> is reset. The following is the log of the stress test recurrence after
> incorporating the current repair patch. Although we called
> hisi_sas_refresh_port_id() on controller reset.
>
> [ 5387.235015] hisi_sas_v3_hw 0000:74:02.0: I_T nexus reset: internal
> abort (-5)
> [ 5387.242126] sas: clear nexus ha
> [ 5387.245283] hisi_sas_v3_hw 0000:74:02.0: controller resetting...
> [ 5388.908489] hisi_sas_v3_hw 0000:74:02.0: phyup: phy5 link_rate=10(sata)
> [ 5388.915090] hisi_sas_v3_hw 0000:74:02.0: phyup: phy6 link_rate=10(sata)
> [ 5388.934505] hisi_sas_v3_hw 0000:74:02.0: phyup: phy0 link_rate=9(sata)
> [ 5388.941009] hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=9(sata)
> [ 5388.950976] hisi_sas_v3_hw 0000:74:02.0: phyup: phy4 link_rate=11
> [ 5388.957048] hisi_sas_v3_hw 0000:74:02.0: phyup: phy7 link_rate=11
> [ 5388.980097] hisi_sas_v3_hw 0000:74:02.0: phyup: phy2 link_rate=11
> [ 5388.986169] hisi_sas_v3_hw 0000:74:02.0: phyup: phy3 link_rate=11 //
> phy3 attached a sas disk.
> [ 5389.065103] hisi_sas_v3_hw 0000:74:02.0: task prep: SAS port1 not
> attach device
> [ 5389.072409] sas: executing TMF task failed 5000c500ae49c8f1 (-70)
> [ 5389.078492] hisi_sas_v3_hw 0000:74:02.0: task prep: SAS port1 not
> attach device
> [ 5389.085780] sas: executing TMF task failed 5000c500ae49c8f1 (-70)
> [ 5389.091861] hisi_sas_v3_hw 0000:74:02.0: task prep: SAS port1 not
> attach device
> [ 5389.099146] sas: executing TMF task failed 5000c500ae49c8f1 (-70)
> [ 5389.107419] hisi_sas_v3_hw 0000:74:02.0: controller reset complete //
> controller reset finished
> [ 5389.113686] hisi_sas_v3_hw 0000:74:02.0: phydown: phy0 phy_state=0xfe
> [ 5389.120099] hisi_sas_v3_hw 0000:74:02.0: ignore flutter phy0 down
> [ 5389.136399] hisi_sas_v3_hw 0000:74:02.0: phy3's hw port id changed
> from 1 to 7
> [ 5389.308114] hisi_sas_v3_hw 0000:74:02.0: phyup: phy0 link_rate=9(sata)
>
pm8001 sends sas_notify_port_event(sas_phy, PORTE_LINK_RESET_ERR,) link
reset errors - can you consider doing that in hisi_sas_update_port_id()
when you find an inconstant port id?
next prev parent reply other threads:[~2025-02-24 17:34 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 13:05 [PATCH v3 0/3] scsi: hisi_sas: Fixed IO error caused by port id not updated Xingui Yang
2025-02-20 13:05 ` [PATCH v3 1/3] scsi: hisi_sas: Enable force phy when SATA disk directly connected Xingui Yang
2025-02-20 17:35 ` John Garry
2025-02-21 1:59 ` yangxingui
2025-02-24 8:29 ` John Garry
2025-02-24 9:36 ` yangxingui
2025-02-24 12:21 ` John Garry
2025-02-24 13:12 ` yangxingui
2025-02-24 17:34 ` John Garry [this message]
2025-02-25 1:48 ` yangxingui
2025-02-25 8:19 ` John Garry
2025-02-25 9:35 ` yangxingui
2025-02-26 8:57 ` John Garry
2025-02-27 8:33 ` yangxingui
2025-03-04 9:48 ` John Garry
2025-03-05 8:16 ` yangxingui
2025-03-05 16:15 ` John Garry
2025-03-06 1:44 ` yangxingui
2025-03-10 13:09 ` yangxingui
2025-03-10 17:45 ` John Garry
2025-03-11 1:53 ` yangxingui
2025-03-12 9:44 ` yangxingui
2025-03-12 11:19 ` John Garry
2025-02-20 13:05 ` [PATCH v3 2/3] scsi: libsas: Move sas_put_device() to libsas.h Xingui Yang
2025-02-21 2:36 ` Jason Yan
2025-02-20 13:05 ` [PATCH v3 3/3] scsi: hisi_sas: Fixed IO error caused by port id not updated Xingui Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d34595f-ff57-4679-b263-fa3fea006ce3@oracle.com \
--to=john.g.garry@oracle.com \
--cc=f.fangjian@huawei.com \
--cc=jejb@linux.ibm.com \
--cc=kangfenglong@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=liyangyang20@huawei.com \
--cc=liyihang9@huawei.com \
--cc=martin.petersen@oracle.com \
--cc=prime.zeng@huawei.com \
--cc=xiabing14@h-partners.com \
--cc=yanaijie@huawei.com \
--cc=yangxingui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox