From: Alex Williamson <alex@shazbot.org>
To: Longfang Liu <liulongfang@huawei.com>
Cc: <alex.williamson@redhat.com>, <jgg@nvidia.com>,
<jonathan.cameron@huawei.com>, <kvm@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] hisi_acc_vfio_pci: fix VF reset timeout issue
Date: Fri, 16 Jan 2026 09:47:58 -0700 [thread overview]
Message-ID: <20260116094758.09fc60d8@shazbot.org> (raw)
In-Reply-To: <20260104070706.4107994-2-liulongfang@huawei.com>
On Sun, 4 Jan 2026 15:07:03 +0800
Longfang Liu <liulongfang@huawei.com> wrote:
> From: Weili Qian <qianweili@huawei.com>
>
> If device error occurs during live migration, qemu will
> reset the VF. At this time, VF reset and device reset are performed
> simultaneously. The VF reset will timeout. Therefore, the QM_RESETTING
> flag is used to ensure that VF reset and device reset are performed
> serially.
>
> Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration")
> Signed-off-by: Weili Qian <qianweili@huawei.com>
> ---
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 24 +++++++++++++++++++
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 2 ++
> 2 files changed, 26 insertions(+)
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index fe2ffcd00d6e..d55365b21f78 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -1188,14 +1188,37 @@ hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
> return 0;
> }
>
> +static void hisi_acc_vf_pci_reset_prepare(struct pci_dev *pdev)
> +{
> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev);
> + struct hisi_qm *qm = hisi_acc_vdev->pf_qm;
> + struct device *dev = &qm->pdev->dev;
> + u32 delay = 0;
> +
> + /* All reset requests need to be queued for processing */
> + while (test_and_set_bit(QM_RESETTING, &qm->misc_ctl)) {
> + msleep(1);
> + if (++delay > QM_RESET_WAIT_TIMEOUT) {
> + dev_err(dev, "reset prepare failed\n");
> + return;
> + }
> + }
> +
> + hisi_acc_vdev->set_reset_flag = true;
> +}
> +
> static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
> {
> struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev);
> + struct hisi_qm *qm = hisi_acc_vdev->pf_qm;
>
> if (hisi_acc_vdev->core_device.vdev.migration_flags !=
> VFIO_MIGRATION_STOP_COPY)
> return;
>
> + if (hisi_acc_vdev->set_reset_flag)
> + clear_bit(QM_RESETTING, &qm->misc_ctl);
.reset_prepare sets QM_RESETTING unconditionally, .reset_done clears
QM_RESETTING conditionally based on the migration state. In 2/ this
becomes conditional on the device supporting migration ops. Doesn't
this enable a scenario where a device that does not support migration
puts QM_RESETTING into an inconsistent state that is never cleared?
Should the clear_bit() occur before the migration state/capability
check?
Thanks,
Alex
> +
> mutex_lock(&hisi_acc_vdev->state_mutex);
> hisi_acc_vf_reset(hisi_acc_vdev);
> mutex_unlock(&hisi_acc_vdev->state_mutex);
> @@ -1746,6 +1769,7 @@ static const struct pci_device_id hisi_acc_vfio_pci_table[] = {
> MODULE_DEVICE_TABLE(pci, hisi_acc_vfio_pci_table);
>
> static const struct pci_error_handlers hisi_acc_vf_err_handlers = {
> + .reset_prepare = hisi_acc_vf_pci_reset_prepare,
> .reset_done = hisi_acc_vf_pci_aer_reset_done,
> .error_detected = vfio_pci_core_aer_err_detected,
> };
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> index cd55eba64dfb..a3d91a31e3d8 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> @@ -27,6 +27,7 @@
>
> #define ERROR_CHECK_TIMEOUT 100
> #define CHECK_DELAY_TIME 100
> +#define QM_RESET_WAIT_TIMEOUT 60000
>
> #define QM_SQC_VFT_BASE_SHIFT_V2 28
> #define QM_SQC_VFT_BASE_MASK_V2 GENMASK(15, 0)
> @@ -128,6 +129,7 @@ struct hisi_acc_vf_migration_file {
> struct hisi_acc_vf_core_device {
> struct vfio_pci_core_device core_device;
> u8 match_done;
> + bool set_reset_flag;
> /*
> * io_base is only valid when dev_opened is true,
> * which is protected by open_mutex.
next prev parent reply other threads:[~2026-01-16 16:48 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-04 7:07 [PATCH 0/4] bugfix some issues under abnormal scenarios Longfang Liu
2026-01-04 7:07 ` [PATCH 1/4] hisi_acc_vfio_pci: fix VF reset timeout issue Longfang Liu
2026-01-16 16:47 ` Alex Williamson [this message]
2026-01-20 7:32 ` liulongfang
2026-01-04 7:07 ` [PATCH 2/4] hisi_acc_vfio_pci: update status after RAS error Longfang Liu
2026-01-04 7:07 ` [PATCH 3/4] hisi_acc_vfio_pci: resolve duplicate migration states Longfang Liu
2026-01-04 7:07 ` [PATCH 4/4] hisi_acc_vfio_pci: fix the queue parameter anomaly issue Longfang Liu
2026-01-16 17:07 ` Alex Williamson
2026-01-20 7:51 ` liulongfang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260116094758.09fc60d8@shazbot.org \
--to=alex@shazbot.org \
--cc=alex.williamson@redhat.com \
--cc=jgg@nvidia.com \
--cc=jonathan.cameron@huawei.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=liulongfang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.