* [PATCH v4 1/2] scsi: hisi_sas: Enable force phy when SATA disk directly connected
2025-03-12 9:51 [PATCH v4 0/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes Xingui Yang
@ 2025-03-12 9:51 ` Xingui Yang
2025-03-12 9:51 ` [PATCH v4 2/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes Xingui Yang
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Xingui Yang @ 2025-03-12 9:51 UTC (permalink / raw)
To: john.g.garry, yanaijie
Cc: jejb, martin.petersen, linux-scsi, linuxarm, prime.zeng,
yangxingui, liuyonglong, kangfenglong, liyangyang20, f.fangjian,
xiabing14, zhonghaoquan
the SAS controller determines the disk to which I/Os are delivered based
on the port id in the DQ entry when SATA disk directly connected.
When many phys were disconnected immediately and connected again during
I/O sending and port id of phys were changed and used by other link, I/O
may be sent to incorrect disk and data inconsistency on the SATA disk may
occur during I/O retry with the old port id. So enable force phy, then
force the command to be executed in a certain phy, and if the actual phy
id of the port does not match the phy configured in the command, the chip
will stop delivering the I/O to disk.
Fixes: ce60689e12dd ("scsi: hisi_sas: add v3 code to send ATA frame")
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Yihang Li <liyihang9@huawei.com>
---
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 9 +++++++--
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 14 ++++++++++++--
2 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 71cd5b4450c2..7b0dcd80f5a8 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -2501,6 +2501,7 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba,
struct hisi_sas_port *port = to_hisi_sas_port(sas_port);
struct sas_ata_task *ata_task = &task->ata_task;
struct sas_tmf_task *tmf = slot->tmf;
+ int phy_id;
u8 *buf_cmd;
int has_data = 0, hdr_tag = 0;
u32 dw0, dw1 = 0, dw2 = 0;
@@ -2508,10 +2509,14 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba,
/* create header */
/* dw0 */
dw0 = port->id << CMD_HDR_PORT_OFF;
- if (parent_dev && dev_is_expander(parent_dev->dev_type))
+ if (parent_dev && dev_is_expander(parent_dev->dev_type)) {
dw0 |= 3 << CMD_HDR_CMD_OFF;
- else
+ } else {
+ phy_id = device->phy->identify.phy_identifier;
+ dw0 |= (1U << phy_id) << CMD_HDR_PHY_ID_OFF;
+ dw0 |= CMD_HDR_FORCE_PHY_MSK;
dw0 |= 4 << CMD_HDR_CMD_OFF;
+ }
if (tmf && ata_task->force_phy) {
dw0 |= CMD_HDR_FORCE_PHY_MSK;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 48b95d9a7927..bb2142fd2c66 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -359,6 +359,10 @@
#define CMD_HDR_RESP_REPORT_MSK (0x1 << CMD_HDR_RESP_REPORT_OFF)
#define CMD_HDR_TLR_CTRL_OFF 6
#define CMD_HDR_TLR_CTRL_MSK (0x3 << CMD_HDR_TLR_CTRL_OFF)
+#define CMD_HDR_PHY_ID_OFF 8
+#define CMD_HDR_PHY_ID_MSK (0x1ff << CMD_HDR_PHY_ID_OFF)
+#define CMD_HDR_FORCE_PHY_OFF 17
+#define CMD_HDR_FORCE_PHY_MSK (0x1U << CMD_HDR_FORCE_PHY_OFF)
#define CMD_HDR_PORT_OFF 18
#define CMD_HDR_PORT_MSK (0xf << CMD_HDR_PORT_OFF)
#define CMD_HDR_PRIORITY_OFF 27
@@ -1429,15 +1433,21 @@ static void prep_ata_v3_hw(struct hisi_hba *hisi_hba,
struct hisi_sas_cmd_hdr *hdr = slot->cmd_hdr;
struct asd_sas_port *sas_port = device->port;
struct hisi_sas_port *port = to_hisi_sas_port(sas_port);
+ int phy_id;
u8 *buf_cmd;
int has_data = 0, hdr_tag = 0;
u32 dw1 = 0, dw2 = 0;
hdr->dw0 = cpu_to_le32(port->id << CMD_HDR_PORT_OFF);
- if (parent_dev && dev_is_expander(parent_dev->dev_type))
+ if (parent_dev && dev_is_expander(parent_dev->dev_type)) {
hdr->dw0 |= cpu_to_le32(3 << CMD_HDR_CMD_OFF);
- else
+ } else {
+ phy_id = device->phy->identify.phy_identifier;
+ hdr->dw0 |= cpu_to_le32((1U << phy_id)
+ << CMD_HDR_PHY_ID_OFF);
+ hdr->dw0 |= CMD_HDR_FORCE_PHY_MSK;
hdr->dw0 |= cpu_to_le32(4U << CMD_HDR_CMD_OFF);
+ }
switch (task->data_dir) {
case DMA_TO_DEVICE:
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v4 2/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes
2025-03-12 9:51 [PATCH v4 0/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes Xingui Yang
2025-03-12 9:51 ` [PATCH v4 1/2] scsi: hisi_sas: Enable force phy when SATA disk directly connected Xingui Yang
@ 2025-03-12 9:51 ` Xingui Yang
2025-03-20 15:37 ` [PATCH v4 0/2] " John Garry
2025-03-21 0:47 ` Martin K. Petersen
3 siblings, 0 replies; 6+ messages in thread
From: Xingui Yang @ 2025-03-12 9:51 UTC (permalink / raw)
To: john.g.garry, yanaijie
Cc: jejb, martin.petersen, linux-scsi, linuxarm, prime.zeng,
yangxingui, liuyonglong, kangfenglong, liyangyang20, f.fangjian,
xiabing14, zhonghaoquan
The hw port id of phy may change when inserting disks in batches, causing
the port id in hisi_sas_port and itct to be inconsistent with the hardware,
resulting in IO errors. The solution is to set the device state to gone to
intercept IO sent to the device, and then execute linkreset to discard and
find the disk to re-update its information.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
---
drivers/scsi/hisi_sas/hisi_sas_main.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index da4a2ed8ee86..edb1efc241db 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -911,8 +911,28 @@ static void hisi_sas_phyup_work_common(struct work_struct *work,
container_of(work, typeof(*phy), works[event]);
struct hisi_hba *hisi_hba = phy->hisi_hba;
struct asd_sas_phy *sas_phy = &phy->sas_phy;
+ struct asd_sas_port *sas_port = sas_phy->port;
+ struct hisi_sas_port *port = phy->port;
+ struct device *dev = hisi_hba->dev;
+ struct domain_device *port_dev;
int phy_no = sas_phy->id;
+ if (!test_bit(HISI_SAS_RESETTING_BIT, &hisi_hba->flags) &&
+ sas_port && port && (port->id != phy->port_id)) {
+ dev_info(dev, "phy%d's hw port id changed from %d to %llu\n",
+ phy_no, port->id, phy->port_id);
+ port_dev = sas_port->port_dev;
+ if (port_dev && !dev_is_expander(port_dev->dev_type)) {
+ /*
+ * Set the device state to gone to block
+ * sending IO to the device.
+ */
+ set_bit(SAS_DEV_GONE, &port_dev->state);
+ hisi_sas_notify_phy_event(phy, HISI_PHYE_LINK_RESET);
+ return;
+ }
+ }
+
phy->wait_phyup_cnt = 0;
if (phy->identify.target_port_protocols == SAS_PROTOCOL_SSP)
hisi_hba->hw->sl_notify_ssp(hisi_hba, phy_no);
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v4 0/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes
2025-03-12 9:51 [PATCH v4 0/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes Xingui Yang
2025-03-12 9:51 ` [PATCH v4 1/2] scsi: hisi_sas: Enable force phy when SATA disk directly connected Xingui Yang
2025-03-12 9:51 ` [PATCH v4 2/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes Xingui Yang
@ 2025-03-20 15:37 ` John Garry
2025-03-24 7:20 ` yangxingui
2025-03-21 0:47 ` Martin K. Petersen
3 siblings, 1 reply; 6+ messages in thread
From: John Garry @ 2025-03-20 15:37 UTC (permalink / raw)
To: Xingui Yang, yanaijie
Cc: jejb, martin.petersen, linux-scsi, linuxarm, prime.zeng,
liuyonglong, kangfenglong, liyangyang20, f.fangjian, xiabing14,
zhonghaoquan
On 12/03/2025 09:51, Xingui Yang wrote:
> This series of patches is used to solve the problem that IO may be sent to
> the incorrect disk after the HW port ID of the directly connected device
> is changed.
>
> Changes from v3:
> - Lose and find the disk when hw port id changes based on John's suggestion
>
> Changes from v2:
> - Use asynchronous scheduling
>
> Changes from v1:
> - Fix "BUG: Atomic scheduling in clear_itct_v3_hw()"
>
> Xingui Yang (2):
> scsi: hisi_sas: Enable force phy when SATA disk directly connected
> scsi: hisi_sas: Fix IO errors caused by hardware port ID changes
So this is all solved in the LLDD, then this is good
thanks
>
> drivers/scsi/hisi_sas/hisi_sas_main.c | 20 ++++++++++++++++++++
> drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 9 +++++++--
> drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 14 ++++++++++++--
> 3 files changed, 39 insertions(+), 4 deletions(-)
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 0/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes
2025-03-20 15:37 ` [PATCH v4 0/2] " John Garry
@ 2025-03-24 7:20 ` yangxingui
0 siblings, 0 replies; 6+ messages in thread
From: yangxingui @ 2025-03-24 7:20 UTC (permalink / raw)
To: John Garry, yanaijie
Cc: jejb, martin.petersen, linux-scsi, linuxarm, prime.zeng,
liuyonglong, kangfenglong, liyangyang20, f.fangjian, xiabing14,
zhonghaoquan
On 2025/3/20 23:37, John Garry wrote:
> On 12/03/2025 09:51, Xingui Yang wrote:
>> This series of patches is used to solve the problem that IO may be
>> sent to
>> the incorrect disk after the HW port ID of the directly connected device
>> is changed.
>>
>> Changes from v3:
>> - Lose and find the disk when hw port id changes based on John's
>> suggestion
>>
>> Changes from v2:
>> - Use asynchronous scheduling
>>
>> Changes from v1:
>> - Fix "BUG: Atomic scheduling in clear_itct_v3_hw()"
>>
>> Xingui Yang (2):
>> scsi: hisi_sas: Enable force phy when SATA disk directly connected
>> scsi: hisi_sas: Fix IO errors caused by hardware port ID changes
>
> So this is all solved in the LLDD, then this is good
Yes, thank you for your advice.
Thanks,
Xingui
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 0/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes
2025-03-12 9:51 [PATCH v4 0/2] scsi: hisi_sas: Fix IO errors caused by hardware port ID changes Xingui Yang
` (2 preceding siblings ...)
2025-03-20 15:37 ` [PATCH v4 0/2] " John Garry
@ 2025-03-21 0:47 ` Martin K. Petersen
3 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2025-03-21 0:47 UTC (permalink / raw)
To: Xingui Yang
Cc: john.g.garry, yanaijie, jejb, martin.petersen, linux-scsi,
linuxarm, prime.zeng, liuyonglong, kangfenglong, liyangyang20,
f.fangjian, xiabing14, zhonghaoquan
Xingui,
> This series of patches is used to solve the problem that IO may be
> sent to the incorrect disk after the HW port ID of the directly
> connected device is changed.
Applied to 6.15/scsi-staging, thanks!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 6+ messages in thread