* [PATCH v2 0/4] bugfix some issues under abnormal scenarios.
@ 2026-01-22 2:02 Longfang Liu
2026-01-22 2:02 ` [PATCH v2 1/4] hisi_acc_vfio_pci: fix VF reset timeout issue Longfang Liu
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Longfang Liu @ 2026-01-22 2:02 UTC (permalink / raw)
To: alex.williamson, jgg, jonathan.cameron; +Cc: kvm, linux-kernel, liulongfang
In certain reset scenarios, repeated migration scenarios, and error injection
scenarios, it is essential to ensure that the device driver functions properly.
Issues arising in these scenarios need to be addressed and fixed
Change v1 -> v2
Fix the reset state handling issue
Longfang Liu (3):
hisi_acc_vfio_pci: update status after RAS error
hisi_acc_vfio_pci: resolve duplicate migration states
hisi_acc_vfio_pci: fix the queue parameter anomaly issue
Weili Qian (1):
hisi_acc_vfio_pci: fix VF reset timeout issue
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 30 +++++++++++++++++--
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 2 ++
2 files changed, 29 insertions(+), 3 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/4] hisi_acc_vfio_pci: fix VF reset timeout issue
2026-01-22 2:02 [PATCH v2 0/4] bugfix some issues under abnormal scenarios Longfang Liu
@ 2026-01-22 2:02 ` Longfang Liu
2026-01-22 2:02 ` [PATCH v2 2/4] hisi_acc_vfio_pci: update status after RAS error Longfang Liu
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Longfang Liu @ 2026-01-22 2:02 UTC (permalink / raw)
To: alex.williamson, jgg, jonathan.cameron; +Cc: kvm, linux-kernel, liulongfang
From: Weili Qian <qianweili@huawei.com>
If device error occurs during live migration, qemu will
reset the VF. At this time, VF reset and device reset are performed
simultaneously. The VF reset will timeout. Therefore, the QM_RESETTING
flag is used to ensure that VF reset and device reset are performed
serially.
Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration")
Signed-off-by: Weili Qian <qianweili@huawei.com>
---
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 24 +++++++++++++++++++
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 2 ++
2 files changed, 26 insertions(+)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index cf45f6370c36..d1e8053640a9 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1188,9 +1188,32 @@ hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
return 0;
}
+static void hisi_acc_vf_pci_reset_prepare(struct pci_dev *pdev)
+{
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev);
+ struct hisi_qm *qm = hisi_acc_vdev->pf_qm;
+ struct device *dev = &qm->pdev->dev;
+ u32 delay = 0;
+
+ /* All reset requests need to be queued for processing */
+ while (test_and_set_bit(QM_RESETTING, &qm->misc_ctl)) {
+ msleep(1);
+ if (++delay > QM_RESET_WAIT_TIMEOUT) {
+ dev_err(dev, "reset prepare failed\n");
+ return;
+ }
+ }
+
+ hisi_acc_vdev->set_reset_flag = true;
+}
+
static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
{
struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev);
+ struct hisi_qm *qm = hisi_acc_vdev->pf_qm;
+
+ if (hisi_acc_vdev->set_reset_flag)
+ clear_bit(QM_RESETTING, &qm->misc_ctl);
if (hisi_acc_vdev->core_device.vdev.migration_flags !=
VFIO_MIGRATION_STOP_COPY)
@@ -1734,6 +1757,7 @@ static const struct pci_device_id hisi_acc_vfio_pci_table[] = {
MODULE_DEVICE_TABLE(pci, hisi_acc_vfio_pci_table);
static const struct pci_error_handlers hisi_acc_vf_err_handlers = {
+ .reset_prepare = hisi_acc_vf_pci_reset_prepare,
.reset_done = hisi_acc_vf_pci_aer_reset_done,
.error_detected = vfio_pci_core_aer_err_detected,
};
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
index cd55eba64dfb..a3d91a31e3d8 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
@@ -27,6 +27,7 @@
#define ERROR_CHECK_TIMEOUT 100
#define CHECK_DELAY_TIME 100
+#define QM_RESET_WAIT_TIMEOUT 60000
#define QM_SQC_VFT_BASE_SHIFT_V2 28
#define QM_SQC_VFT_BASE_MASK_V2 GENMASK(15, 0)
@@ -128,6 +129,7 @@ struct hisi_acc_vf_migration_file {
struct hisi_acc_vf_core_device {
struct vfio_pci_core_device core_device;
u8 match_done;
+ bool set_reset_flag;
/*
* io_base is only valid when dev_opened is true,
* which is protected by open_mutex.
--
2.33.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/4] hisi_acc_vfio_pci: update status after RAS error
2026-01-22 2:02 [PATCH v2 0/4] bugfix some issues under abnormal scenarios Longfang Liu
2026-01-22 2:02 ` [PATCH v2 1/4] hisi_acc_vfio_pci: fix VF reset timeout issue Longfang Liu
@ 2026-01-22 2:02 ` Longfang Liu
2026-01-22 2:02 ` [PATCH v2 3/4] hisi_acc_vfio_pci: resolve duplicate migration states Longfang Liu
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Longfang Liu @ 2026-01-22 2:02 UTC (permalink / raw)
To: alex.williamson, jgg, jonathan.cameron; +Cc: kvm, linux-kernel, liulongfang
After a RAS error occurs on the accelerator device, the accelerator
device will be reset. The live migration state will be abnormal
after reset, and the original state needs to be restored during
the reset process.
Therefore, reset processing needs to be performed in a live
migration scenario.
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
---
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index d1e8053640a9..c69caef2e910 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1215,8 +1215,7 @@ static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
if (hisi_acc_vdev->set_reset_flag)
clear_bit(QM_RESETTING, &qm->misc_ctl);
- if (hisi_acc_vdev->core_device.vdev.migration_flags !=
- VFIO_MIGRATION_STOP_COPY)
+ if (!hisi_acc_vdev->core_device.vdev.mig_ops)
return;
mutex_lock(&hisi_acc_vdev->state_mutex);
--
2.33.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/4] hisi_acc_vfio_pci: resolve duplicate migration states
2026-01-22 2:02 [PATCH v2 0/4] bugfix some issues under abnormal scenarios Longfang Liu
2026-01-22 2:02 ` [PATCH v2 1/4] hisi_acc_vfio_pci: fix VF reset timeout issue Longfang Liu
2026-01-22 2:02 ` [PATCH v2 2/4] hisi_acc_vfio_pci: update status after RAS error Longfang Liu
@ 2026-01-22 2:02 ` Longfang Liu
2026-01-22 2:02 ` [PATCH v2 4/4] hisi_acc_vfio_pci: fix the queue parameter anomaly issue Longfang Liu
2026-01-29 21:58 ` [PATCH v2 0/4] bugfix some issues under abnormal scenarios Alex Williamson
4 siblings, 0 replies; 7+ messages in thread
From: Longfang Liu @ 2026-01-22 2:02 UTC (permalink / raw)
To: alex.williamson, jgg, jonathan.cameron; +Cc: kvm, linux-kernel, liulongfang
In special scenarios involving duplicate migrations, after the
first migration is completed, if the original VF device is used
again and then migrated to another destination, the state indicating
data migration completion for the VF device is not reset.
This results in the second migration to the destination being skipped
without performing data migration.
After the modification, it ensures that a complete data migration
is performed after the subsequent migration.
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
---
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index c69caef2e910..483381189579 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1569,6 +1569,7 @@ static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
}
hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
hisi_acc_vdev->dev_opened = true;
+ hisi_acc_vdev->match_done = 0;
mutex_unlock(&hisi_acc_vdev->open_mutex);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 4/4] hisi_acc_vfio_pci: fix the queue parameter anomaly issue
2026-01-22 2:02 [PATCH v2 0/4] bugfix some issues under abnormal scenarios Longfang Liu
` (2 preceding siblings ...)
2026-01-22 2:02 ` [PATCH v2 3/4] hisi_acc_vfio_pci: resolve duplicate migration states Longfang Liu
@ 2026-01-22 2:02 ` Longfang Liu
2026-01-29 21:58 ` [PATCH v2 0/4] bugfix some issues under abnormal scenarios Alex Williamson
4 siblings, 0 replies; 7+ messages in thread
From: Longfang Liu @ 2026-01-22 2:02 UTC (permalink / raw)
To: alex.williamson, jgg, jonathan.cameron; +Cc: kvm, linux-kernel, liulongfang
When the number of QPs initialized by the device, as read via vft, is zero,
it indicates either an abnormal device configuration or an abnormal read
result.
Returning 0 directly in this case would allow the live migration operation
to complete successfully, leading to incorrect parameter configuration after
migration and preventing the service from recovering normal functionality.
Therefore, in such situations, an error should be returned to roll back the
live migration operation.
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
---
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 483381189579..e61df3fe0db9 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -426,7 +426,7 @@ static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
ret = qm_get_vft(vf_qm, &vf_qm->qp_base);
if (ret <= 0) {
dev_err(dev, "failed to get vft qp nums\n");
- return ret;
+ return ret < 0 ? ret : -EINVAL;
}
if (ret != vf_data->qp_num) {
--
2.33.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/4] bugfix some issues under abnormal scenarios.
2026-01-22 2:02 [PATCH v2 0/4] bugfix some issues under abnormal scenarios Longfang Liu
` (3 preceding siblings ...)
2026-01-22 2:02 ` [PATCH v2 4/4] hisi_acc_vfio_pci: fix the queue parameter anomaly issue Longfang Liu
@ 2026-01-29 21:58 ` Alex Williamson
2026-01-30 2:08 ` liulongfang
4 siblings, 1 reply; 7+ messages in thread
From: Alex Williamson @ 2026-01-29 21:58 UTC (permalink / raw)
To: Longfang Liu; +Cc: jgg, jonathan.cameron, kvm, linux-kernel
On Thu, 22 Jan 2026 10:02:01 +0800
Longfang Liu <liulongfang@huawei.com> wrote:
> In certain reset scenarios, repeated migration scenarios, and error injection
> scenarios, it is essential to ensure that the device driver functions properly.
> Issues arising in these scenarios need to be addressed and fixed
>
> Change v1 -> v2
> Fix the reset state handling issue
>
> Longfang Liu (3):
> hisi_acc_vfio_pci: update status after RAS error
> hisi_acc_vfio_pci: resolve duplicate migration states
> hisi_acc_vfio_pci: fix the queue parameter anomaly issue
>
> Weili Qian (1):
> hisi_acc_vfio_pci: fix VF reset timeout issue
>
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 30 +++++++++++++++++--
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 2 ++
> 2 files changed, 29 insertions(+), 3 deletions(-)
>
Applied to vfio next branch for v6.20/v7.0. Thanks,
Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/4] bugfix some issues under abnormal scenarios.
2026-01-29 21:58 ` [PATCH v2 0/4] bugfix some issues under abnormal scenarios Alex Williamson
@ 2026-01-30 2:08 ` liulongfang
0 siblings, 0 replies; 7+ messages in thread
From: liulongfang @ 2026-01-30 2:08 UTC (permalink / raw)
To: Alex Williamson; +Cc: jgg, jonathan.cameron, kvm, linux-kernel
On 2026/1/30 5:58, Alex Williamson wrote:
> On Thu, 22 Jan 2026 10:02:01 +0800
> Longfang Liu <liulongfang@huawei.com> wrote:
>
>> In certain reset scenarios, repeated migration scenarios, and error injection
>> scenarios, it is essential to ensure that the device driver functions properly.
>> Issues arising in these scenarios need to be addressed and fixed
>>
>> Change v1 -> v2
>> Fix the reset state handling issue
>>
>> Longfang Liu (3):
>> hisi_acc_vfio_pci: update status after RAS error
>> hisi_acc_vfio_pci: resolve duplicate migration states
>> hisi_acc_vfio_pci: fix the queue parameter anomaly issue
>>
>> Weili Qian (1):
>> hisi_acc_vfio_pci: fix VF reset timeout issue
>>
>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 30 +++++++++++++++++--
>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 2 ++
>> 2 files changed, 29 insertions(+), 3 deletions(-)
>>
>
> Applied to vfio next branch for v6.20/v7.0. Thanks,
>
> Alex
> .
>
Thanks.
Longfang
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-01-30 2:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-22 2:02 [PATCH v2 0/4] bugfix some issues under abnormal scenarios Longfang Liu
2026-01-22 2:02 ` [PATCH v2 1/4] hisi_acc_vfio_pci: fix VF reset timeout issue Longfang Liu
2026-01-22 2:02 ` [PATCH v2 2/4] hisi_acc_vfio_pci: update status after RAS error Longfang Liu
2026-01-22 2:02 ` [PATCH v2 3/4] hisi_acc_vfio_pci: resolve duplicate migration states Longfang Liu
2026-01-22 2:02 ` [PATCH v2 4/4] hisi_acc_vfio_pci: fix the queue parameter anomaly issue Longfang Liu
2026-01-29 21:58 ` [PATCH v2 0/4] bugfix some issues under abnormal scenarios Alex Williamson
2026-01-30 2:08 ` liulongfang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox