public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] bugfix some driver issues
@ 2024-12-19  9:17 Longfang Liu
  2024-12-19  9:17 ` [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error Longfang Liu
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Longfang Liu @ 2024-12-19  9:17 UTC (permalink / raw)
  To: alex.williamson, jgg, shameerali.kolothum.thodi, jonathan.cameron
  Cc: kvm, linux-kernel, linuxarm, liulongfang

As the test scenarios for the live migration function become
more and more extensive. Some previously undiscovered driver
issues were found.
Update and fix through this patchset.

Longfang Liu (5):
  hisi_acc_vfio_pci: fix XQE dma address error
  hisi_acc_vfio_pci: add eq and aeq interruption restore
  hisi_acc_vfio_pci: bugfix cache write-back issue
  hisi_acc_vfio_pci: bugfix the problem of uninstalling driver
  hisi_acc_vfio_pci: bugfix live migration function without VF device
    driver

 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 83 ++++++++++++++++---
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 +-
 2 files changed, 78 insertions(+), 14 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2024-12-19  9:17 [PATCH v2 0/5] bugfix some driver issues Longfang Liu
@ 2024-12-19  9:17 ` Longfang Liu
  2024-12-19 10:01   ` Shameerali Kolothum Thodi
  2024-12-19  9:17 ` [PATCH v2 2/5] hisi_acc_vfio_pci: add eq and aeq interruption restore Longfang Liu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Longfang Liu @ 2024-12-19  9:17 UTC (permalink / raw)
  To: alex.williamson, jgg, shameerali.kolothum.thodi, jonathan.cameron
  Cc: kvm, linux-kernel, linuxarm, liulongfang

The dma addresses of EQE and AEQE are wrong after migration and
results in guest kernel-mode encryption services  failure.
Comparing the definition of hardware registers, we found that
there was an error when the data read from the register was
combined into an address. Therefore, the address combination
sequence needs to be corrected.

Even after fixing the above problem, we still have an issue
where the Guest from an old kernel can get migrated to
new kernel and may result in wrong data.

In order to ensure that the address is correct after migration,
if an old magic number is detected, the dma address needs to be
updated.

Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live migration")
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
---
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 34 +++++++++++++++----
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 ++++-
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 451c639299eb..8518efea3a52 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -350,6 +350,27 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
 	return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
 }
 
+static int vf_qm_magic_check(struct acc_vf_data *vf_data)
+{
+	switch (vf_data->acc_magic) {
+	case ACC_DEV_MAGIC_V2:
+		break;
+	case ACC_DEV_MAGIC_V1:
+		/* Correct dma address */
+		vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
+		vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
+		vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
+		vf_data->aeqe_dma = vf_data->qm_aeqc_dw[QM_XQC_ADDR_HIGH];
+		vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
+		vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[QM_XQC_ADDR_LOW];
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 			     struct hisi_acc_vf_migration_file *migf)
 {
@@ -363,7 +384,8 @@ static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 	if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev->match_done)
 		return 0;
 
-	if (vf_data->acc_magic != ACC_DEV_MAGIC) {
+	ret = vf_qm_magic_check(vf_data);
+	if (ret) {
 		dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
 		return -EINVAL;
 	}
@@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 	int vf_id = hisi_acc_vdev->vf_id;
 	int ret;
 
-	vf_data->acc_magic = ACC_DEV_MAGIC;
+	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
 	/* Save device id */
 	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
 
@@ -496,12 +518,12 @@ static int vf_qm_read_data(struct hisi_qm *vf_qm, struct acc_vf_data *vf_data)
 		return -EINVAL;
 
 	/* Every reg is 32 bit, the dma address is 64 bit. */
-	vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
+	vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
 	vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
-	vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
-	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
+	vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
+	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[QM_XQC_ADDR_HIGH];
 	vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
-	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
+	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[QM_XQC_ADDR_LOW];
 
 	/* Through SQC_BT/CQC_BT to get sqc and cqc address */
 	ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
index 245d7537b2bc..2afce68f5a34 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
@@ -39,6 +39,9 @@
 #define QM_REG_ADDR_OFFSET	0x0004
 
 #define QM_XQC_ADDR_OFFSET	32U
+#define QM_XQC_ADDR_LOW	0x1
+#define QM_XQC_ADDR_HIGH	0x2
+
 #define QM_VF_AEQ_INT_MASK	0x0004
 #define QM_VF_EQ_INT_MASK	0x000c
 #define QM_IFC_INT_SOURCE_V	0x0020
@@ -50,10 +53,14 @@
 #define QM_EQC_DW0		0X8000
 #define QM_AEQC_DW0		0X8020
 
+enum acc_magic_num {
+	ACC_DEV_MAGIC_V1 = 0XCDCDCDCDFEEDAACC,
+	ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECADEDE,
+};
+
 struct acc_vf_data {
 #define QM_MATCH_SIZE offsetofend(struct acc_vf_data, qm_rsv_state)
 	/* QM match information */
-#define ACC_DEV_MAGIC	0XCDCDCDCDFEEDAACC
 	u64 acc_magic;
 	u32 qp_num;
 	u32 dev_id;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/5] hisi_acc_vfio_pci: add eq and aeq interruption restore
  2024-12-19  9:17 [PATCH v2 0/5] bugfix some driver issues Longfang Liu
  2024-12-19  9:17 ` [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error Longfang Liu
@ 2024-12-19  9:17 ` Longfang Liu
  2024-12-19 10:01   ` Shameerali Kolothum Thodi
  2024-12-19  9:17 ` [PATCH v2 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue Longfang Liu
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Longfang Liu @ 2024-12-19  9:17 UTC (permalink / raw)
  To: alex.williamson, jgg, shameerali.kolothum.thodi, jonathan.cameron
  Cc: kvm, linux-kernel, linuxarm, liulongfang

In order to ensure that the task packets of the accelerator
device are not lost during the migration process, it is necessary
to send an EQ and AEQ command to the device after the live migration
is completed and to update the completion position of the task queue.

Let the device recheck the completed tasks data and if there are
uncollected packets, device resend a task completion interrupt
to the software.

Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live migration")
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
---
 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 8518efea3a52..4c8f1ae5b636 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -463,6 +463,19 @@ static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 	return 0;
 }
 
+static void vf_qm_xeqc_save(struct hisi_qm *qm,
+			    struct hisi_acc_vf_migration_file *migf)
+{
+	struct acc_vf_data *vf_data = &migf->vf_data;
+	u16 eq_head, aeq_head;
+
+	eq_head = vf_data->qm_eqc_dw[0] & 0xFFFF;
+	qm_db(qm, 0, QM_DOORBELL_CMD_EQ, eq_head, 0);
+
+	aeq_head = vf_data->qm_aeqc_dw[0] & 0xFFFF;
+	qm_db(qm, 0, QM_DOORBELL_CMD_AEQ, aeq_head, 0);
+}
+
 static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 			   struct hisi_acc_vf_migration_file *migf)
 {
@@ -571,6 +584,9 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 		return -EINVAL;
 
 	migf->total_length = sizeof(struct acc_vf_data);
+	/* Save eqc and aeqc interrupt information */
+	vf_qm_xeqc_save(vf_qm, migf);
+
 	return 0;
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue
  2024-12-19  9:17 [PATCH v2 0/5] bugfix some driver issues Longfang Liu
  2024-12-19  9:17 ` [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error Longfang Liu
  2024-12-19  9:17 ` [PATCH v2 2/5] hisi_acc_vfio_pci: add eq and aeq interruption restore Longfang Liu
@ 2024-12-19  9:17 ` Longfang Liu
  2024-12-19 10:01   ` Shameerali Kolothum Thodi
  2024-12-19  9:17 ` [PATCH v2 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling driver Longfang Liu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Longfang Liu @ 2024-12-19  9:17 UTC (permalink / raw)
  To: alex.williamson, jgg, shameerali.kolothum.thodi, jonathan.cameron
  Cc: kvm, linux-kernel, linuxarm, liulongfang

At present, cache write-back is placed in the device data
copy stage after stopping the device operation.
Writing back to the cache at this stage will cause the data
obtained by the cache to be written back to be empty.

In order to ensure that the cache data is written back
successfully, the data needs to be written back into the
stop device stage.

Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live migration")
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
---
 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 4c8f1ae5b636..c057c0e24693 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -559,7 +559,6 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 {
 	struct acc_vf_data *vf_data = &migf->vf_data;
 	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
-	struct device *dev = &vf_qm->pdev->dev;
 	int ret;
 
 	if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
@@ -573,12 +572,6 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 	vf_data->vf_qm_state = QM_READY;
 	hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
 
-	ret = vf_qm_cache_wb(vf_qm);
-	if (ret) {
-		dev_err(dev, "failed to writeback QM Cache!\n");
-		return ret;
-	}
-
 	ret = vf_qm_read_data(vf_qm, vf_data);
 	if (ret)
 		return -EINVAL;
@@ -1005,6 +998,13 @@ static int hisi_acc_vf_stop_device(struct hisi_acc_vf_core_device *hisi_acc_vdev
 		dev_err(dev, "failed to check QM INT state!\n");
 		return ret;
 	}
+
+	ret = vf_qm_cache_wb(vf_qm);
+	if (ret) {
+		dev_err(dev, "failed to writeback QM cache!\n");
+		return ret;
+	}
+
 	return 0;
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling driver
  2024-12-19  9:17 [PATCH v2 0/5] bugfix some driver issues Longfang Liu
                   ` (2 preceding siblings ...)
  2024-12-19  9:17 ` [PATCH v2 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue Longfang Liu
@ 2024-12-19  9:17 ` Longfang Liu
  2024-12-19 10:01   ` Shameerali Kolothum Thodi
  2024-12-19  9:18 ` [PATCH v2 5/5] hisi_acc_vfio_pci: bugfix live migration function without VF device driver Longfang Liu
  2024-12-19 10:23 ` [PATCH v2 0/5] bugfix some driver issues Shameerali Kolothum Thodi
  5 siblings, 1 reply; 21+ messages in thread
From: Longfang Liu @ 2024-12-19  9:17 UTC (permalink / raw)
  To: alex.williamson, jgg, shameerali.kolothum.thodi, jonathan.cameron
  Cc: kvm, linux-kernel, linuxarm, liulongfang

In a live migration scenario. If the number of VFs at the
destination is greater than the source, the recovery operation
will fail and qemu will not be able to complete the process and
exit after shutting down the device FD.

This will cause the driver to be unable to be unloaded normally due
to abnormal reference counting of the live migration driver caused
by the abnormal closing operation of fd.

Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live migration")
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
---
 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index c057c0e24693..8d9e07ebf4fd 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1501,6 +1501,7 @@ static void hisi_acc_vfio_pci_close_device(struct vfio_device *core_vdev)
 	struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev);
 	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
 
+	hisi_acc_vf_disable_fds(hisi_acc_vdev);
 	mutex_lock(&hisi_acc_vdev->open_mutex);
 	hisi_acc_vdev->dev_opened = false;
 	iounmap(vf_qm->io_base);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/5] hisi_acc_vfio_pci: bugfix live migration function without VF device driver
  2024-12-19  9:17 [PATCH v2 0/5] bugfix some driver issues Longfang Liu
                   ` (3 preceding siblings ...)
  2024-12-19  9:17 ` [PATCH v2 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling driver Longfang Liu
@ 2024-12-19  9:18 ` Longfang Liu
  2024-12-19 10:02   ` Shameerali Kolothum Thodi
  2024-12-19 10:23 ` [PATCH v2 0/5] bugfix some driver issues Shameerali Kolothum Thodi
  5 siblings, 1 reply; 21+ messages in thread
From: Longfang Liu @ 2024-12-19  9:18 UTC (permalink / raw)
  To: alex.williamson, jgg, shameerali.kolothum.thodi, jonathan.cameron
  Cc: kvm, linux-kernel, linuxarm, liulongfang

If the driver of the VF device is not loaded in the Guest OS,
then perform device data migration. The migrated data address will
be NULL.
The live migration recovery operation on the destination side will
access a null address value, which will cause access errors.

Therefore, live migration of VMs without added VF device drivers
does not require device data migration.
In addition, when the queue address data obtained by the destination
is empty, device queue recovery processing will not be performed.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
---
 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 8d9e07ebf4fd..9a5f7e9bc695 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -436,6 +436,7 @@ static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 				struct acc_vf_data *vf_data)
 {
 	struct hisi_qm *pf_qm = hisi_acc_vdev->pf_qm;
+	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
 	struct device *dev = &pf_qm->pdev->dev;
 	int vf_id = hisi_acc_vdev->vf_id;
 	int ret;
@@ -460,6 +461,13 @@ static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 		return ret;
 	}
 
+	/* Get VF driver insmod state */
+	ret = qm_read_regs(vf_qm, QM_VF_STATE, &vf_data->vf_qm_state, 1);
+	if (ret) {
+		dev_err(dev, "failed to read QM_VF_STATE!\n");
+		return ret;
+	}
+
 	return 0;
 }
 
@@ -499,6 +507,12 @@ static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 	qm->qp_base = vf_data->qp_base;
 	qm->qp_num = vf_data->qp_num;
 
+	if (!vf_data->eqe_dma || !vf_data->aeqe_dma ||
+	    !vf_data->sqc_dma || !vf_data->cqc_dma) {
+		dev_err(dev, "resume dma addr is NULL!\n");
+		return -EINVAL;
+	}
+
 	ret = qm_set_regs(qm, vf_data);
 	if (ret) {
 		dev_err(dev, "set VF regs failed\n");
@@ -721,6 +735,9 @@ static int hisi_acc_vf_load_state(struct hisi_acc_vf_core_device *hisi_acc_vdev)
 	struct hisi_acc_vf_migration_file *migf = hisi_acc_vdev->resuming_migf;
 	int ret;
 
+	if (hisi_acc_vdev->vf_qm_state != QM_READY)
+		return 0;
+
 	/* Recover data to VF */
 	ret = vf_qm_load_data(hisi_acc_vdev, migf);
 	if (ret) {
@@ -1524,6 +1541,7 @@ static int hisi_acc_vfio_pci_migrn_init_dev(struct vfio_device *core_vdev)
 	hisi_acc_vdev->vf_id = pci_iov_vf_id(pdev) + 1;
 	hisi_acc_vdev->pf_qm = pf_qm;
 	hisi_acc_vdev->vf_dev = pdev;
+	hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
 	mutex_init(&hisi_acc_vdev->state_mutex);
 	mutex_init(&hisi_acc_vdev->open_mutex);
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2024-12-19  9:17 ` [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error Longfang Liu
@ 2024-12-19 10:01   ` Shameerali Kolothum Thodi
  2024-12-27  7:24     ` liulongfang
  2025-01-02 22:30     ` Alex Williamson
  0 siblings, 2 replies; 21+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-12-19 10:01 UTC (permalink / raw)
  To: liulongfang, alex.williamson@redhat.com, jgg@nvidia.com,
	Jonathan Cameron
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org



> -----Original Message-----
> From: liulongfang <liulongfang@huawei.com>
> Sent: Thursday, December 19, 2024 9:18 AM
> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
> Subject: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
> 
> The dma addresses of EQE and AEQE are wrong after migration and
> results in guest kernel-mode encryption services  failure.
> Comparing the definition of hardware registers, we found that
> there was an error when the data read from the register was
> combined into an address. Therefore, the address combination
> sequence needs to be corrected.
> 
> Even after fixing the above problem, we still have an issue
> where the Guest from an old kernel can get migrated to
> new kernel and may result in wrong data.
> 
> In order to ensure that the address is correct after migration,
> if an old magic number is detected, the dma address needs to be
> updated.
> 
> Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
> migration")
> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> ---
>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 34 +++++++++++++++----
>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 ++++-
>  2 files changed, 36 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 451c639299eb..8518efea3a52 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -350,6 +350,27 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
>  	return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
>  }
> 
> +static int vf_qm_magic_check(struct acc_vf_data *vf_data)
> +{
> +	switch (vf_data->acc_magic) {
> +	case ACC_DEV_MAGIC_V2:
> +		break;
> +	case ACC_DEV_MAGIC_V1:
> +		/* Correct dma address */
> +		vf_data->eqe_dma = vf_data-
> >qm_eqc_dw[QM_XQC_ADDR_HIGH];
> +		vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> +		vf_data->eqe_dma |= vf_data-
> >qm_eqc_dw[QM_XQC_ADDR_LOW];
> +		vf_data->aeqe_dma = vf_data-
> >qm_aeqc_dw[QM_XQC_ADDR_HIGH];
> +		vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> +		vf_data->aeqe_dma |= vf_data-
> >qm_aeqc_dw[QM_XQC_ADDR_LOW];
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static int vf_qm_check_match(struct hisi_acc_vf_core_device
> *hisi_acc_vdev,
>  			     struct hisi_acc_vf_migration_file *migf)
>  {
> @@ -363,7 +384,8 @@ static int vf_qm_check_match(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
>  	if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev-
> >match_done)
>  		return 0;
> 
> -	if (vf_data->acc_magic != ACC_DEV_MAGIC) {
> +	ret = vf_qm_magic_check(vf_data);
> +	if (ret) {
>  		dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
>  		return -EINVAL;
>  	}
> @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
>  	int vf_id = hisi_acc_vdev->vf_id;
>  	int ret;
> 
> -	vf_data->acc_magic = ACC_DEV_MAGIC;
> +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
>  	/* Save device id */
>  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
> 
> @@ -496,12 +518,12 @@ static int vf_qm_read_data(struct hisi_qm
> *vf_qm, struct acc_vf_data *vf_data)
>  		return -EINVAL;
> 
>  	/* Every reg is 32 bit, the dma address is 64 bit. */
> -	vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
> +	vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
>  	vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> -	vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
> -	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
> +	vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
> +	vf_data->aeqe_dma = vf_data-
> >qm_aeqc_dw[QM_XQC_ADDR_HIGH];
>  	vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> -	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
> +	vf_data->aeqe_dma |= vf_data-
> >qm_aeqc_dw[QM_XQC_ADDR_LOW];
> 
>  	/* Through SQC_BT/CQC_BT to get sqc and cqc address */
>  	ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> index 245d7537b2bc..2afce68f5a34 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> @@ -39,6 +39,9 @@
>  #define QM_REG_ADDR_OFFSET	0x0004
> 
>  #define QM_XQC_ADDR_OFFSET	32U
> +#define QM_XQC_ADDR_LOW	0x1
> +#define QM_XQC_ADDR_HIGH	0x2
> +
>  #define QM_VF_AEQ_INT_MASK	0x0004
>  #define QM_VF_EQ_INT_MASK	0x000c
>  #define QM_IFC_INT_SOURCE_V	0x0020
> @@ -50,10 +53,14 @@
>  #define QM_EQC_DW0		0X8000
>  #define QM_AEQC_DW0		0X8020
> 
> +enum acc_magic_num {
> +	ACC_DEV_MAGIC_V1 = 0XCDCDCDCDFEEDAACC,
> +	ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECADEDE,


I think we have discussed this before that having some kind of 
version info embed into magic_num will be beneficial going 
forward. ie, may be use the last 4 bytes for denoting version.

ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECA0002

The reason being, otherwise we have to come up with a random
magic each time when a fix like this is required in future.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 2/5] hisi_acc_vfio_pci: add eq and aeq interruption restore
  2024-12-19  9:17 ` [PATCH v2 2/5] hisi_acc_vfio_pci: add eq and aeq interruption restore Longfang Liu
@ 2024-12-19 10:01   ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 21+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-12-19 10:01 UTC (permalink / raw)
  To: liulongfang, alex.williamson@redhat.com, jgg@nvidia.com,
	Jonathan Cameron
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org



> -----Original Message-----
> From: liulongfang <liulongfang@huawei.com>
> Sent: Thursday, December 19, 2024 9:18 AM
> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
> Subject: [PATCH v2 2/5] hisi_acc_vfio_pci: add eq and aeq interruption
> restore
> 
> In order to ensure that the task packets of the accelerator
> device are not lost during the migration process, it is necessary
> to send an EQ and AEQ command to the device after the live migration
> is completed and to update the completion position of the task queue.
> 
> Let the device recheck the completed tasks data and if there are
> uncollected packets, device resend a task completion interrupt
> to the software.
> 
> Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
> migration")
> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> ---
>  drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 8518efea3a52..4c8f1ae5b636 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -463,6 +463,19 @@ static int vf_qm_get_match_data(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
>  	return 0;
>  }
> 
> +static void vf_qm_xeqc_save(struct hisi_qm *qm,
> +			    struct hisi_acc_vf_migration_file *migf)
> +{
> +	struct acc_vf_data *vf_data = &migf->vf_data;
> +	u16 eq_head, aeq_head;
> +
> +	eq_head = vf_data->qm_eqc_dw[0] & 0xFFFF;
> +	qm_db(qm, 0, QM_DOORBELL_CMD_EQ, eq_head, 0);
> +
> +	aeq_head = vf_data->qm_aeqc_dw[0] & 0xFFFF;
> +	qm_db(qm, 0, QM_DOORBELL_CMD_AEQ, aeq_head, 0);
> +}
> +
>  static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
>  			   struct hisi_acc_vf_migration_file *migf)
>  {
> @@ -571,6 +584,9 @@ static int vf_qm_state_save(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
>  		return -EINVAL;
> 
>  	migf->total_length = sizeof(struct acc_vf_data);
> +	/* Save eqc and aeqc interrupt information */
> +	vf_qm_xeqc_save(vf_qm, migf);
> +
>  	return 0;
>  }

Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue
  2024-12-19  9:17 ` [PATCH v2 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue Longfang Liu
@ 2024-12-19 10:01   ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 21+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-12-19 10:01 UTC (permalink / raw)
  To: liulongfang, alex.williamson@redhat.com, jgg@nvidia.com,
	Jonathan Cameron
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org



> -----Original Message-----
> From: liulongfang <liulongfang@huawei.com>
> Sent: Thursday, December 19, 2024 9:18 AM
> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
> Subject: [PATCH v2 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue
> 
> At present, cache write-back is placed in the device data
> copy stage after stopping the device operation.
> Writing back to the cache at this stage will cause the data
> obtained by the cache to be written back to be empty.
> 
> In order to ensure that the cache data is written back
> successfully, the data needs to be written back into the
> stop device stage.
> 
> Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
> migration")
> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> ---
>  drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 4c8f1ae5b636..c057c0e24693 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -559,7 +559,6 @@ static int vf_qm_state_save(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
>  {
>  	struct acc_vf_data *vf_data = &migf->vf_data;
>  	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> -	struct device *dev = &vf_qm->pdev->dev;
>  	int ret;
> 
>  	if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
> @@ -573,12 +572,6 @@ static int vf_qm_state_save(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
>  	vf_data->vf_qm_state = QM_READY;
>  	hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
> 
> -	ret = vf_qm_cache_wb(vf_qm);
> -	if (ret) {
> -		dev_err(dev, "failed to writeback QM Cache!\n");
> -		return ret;
> -	}
> -
>  	ret = vf_qm_read_data(vf_qm, vf_data);
>  	if (ret)
>  		return -EINVAL;
> @@ -1005,6 +998,13 @@ static int hisi_acc_vf_stop_device(struct
> hisi_acc_vf_core_device *hisi_acc_vdev
>  		dev_err(dev, "failed to check QM INT state!\n");
>  		return ret;
>  	}
> +
> +	ret = vf_qm_cache_wb(vf_qm);
> +	if (ret) {
> +		dev_err(dev, "failed to writeback QM cache!\n");
> +		return ret;
> +	}
> +
>  	return 0;
>  }

Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling driver
  2024-12-19  9:17 ` [PATCH v2 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling driver Longfang Liu
@ 2024-12-19 10:01   ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 21+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-12-19 10:01 UTC (permalink / raw)
  To: liulongfang, alex.williamson@redhat.com, jgg@nvidia.com,
	Jonathan Cameron
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org



> -----Original Message-----
> From: liulongfang <liulongfang@huawei.com>
> Sent: Thursday, December 19, 2024 9:18 AM
> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
> Subject: [PATCH v2 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling
> driver
> 
> In a live migration scenario. If the number of VFs at the destination is
> greater than the source, the recovery operation will fail and qemu will not
> be able to complete the process and exit after shutting down the device FD.
> 
> This will cause the driver to be unable to be unloaded normally due to
> abnormal reference counting of the live migration driver caused by the
> abnormal closing operation of fd.
> 
> Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
> migration")
> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> ---
>  drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index c057c0e24693..8d9e07ebf4fd 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -1501,6 +1501,7 @@ static void hisi_acc_vfio_pci_close_device(struct
> vfio_device *core_vdev)
>  	struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(core_vdev);
>  	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> 
> +	hisi_acc_vf_disable_fds(hisi_acc_vdev);
>  	mutex_lock(&hisi_acc_vdev->open_mutex);
>  	hisi_acc_vdev->dev_opened = false;
>  	iounmap(vf_qm->io_base);

Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 5/5] hisi_acc_vfio_pci: bugfix live migration function without VF device driver
  2024-12-19  9:18 ` [PATCH v2 5/5] hisi_acc_vfio_pci: bugfix live migration function without VF device driver Longfang Liu
@ 2024-12-19 10:02   ` Shameerali Kolothum Thodi
  2024-12-27  7:20     ` liulongfang
  0 siblings, 1 reply; 21+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-12-19 10:02 UTC (permalink / raw)
  To: liulongfang, alex.williamson@redhat.com, jgg@nvidia.com,
	Jonathan Cameron
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org



> -----Original Message-----
> From: liulongfang <liulongfang@huawei.com>
> Sent: Thursday, December 19, 2024 9:18 AM
> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
> Subject: [PATCH v2 5/5] hisi_acc_vfio_pci: bugfix live migration function
> without VF device driver
> 
> If the driver of the VF device is not loaded in the Guest OS,
> then perform device data migration. The migrated data address will
> be NULL.
> The live migration recovery operation on the destination side will
> access a null address value, which will cause access errors.
> 
> Therefore, live migration of VMs without added VF device drivers
> does not require device data migration.
> In addition, when the queue address data obtained by the destination
> is empty, device queue recovery processing will not be performed.
> 
> Signed-off-by: Longfang Liu <liulongfang@huawei.com>

Why this doesn't need a Fixes tag?

> ---
>  drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 8d9e07ebf4fd..9a5f7e9bc695 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -436,6 +436,7 @@ static int vf_qm_get_match_data(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
>  				struct acc_vf_data *vf_data)
>  {
>  	struct hisi_qm *pf_qm = hisi_acc_vdev->pf_qm;
> +	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
>  	struct device *dev = &pf_qm->pdev->dev;
>  	int vf_id = hisi_acc_vdev->vf_id;
>  	int ret;
> @@ -460,6 +461,13 @@ static int vf_qm_get_match_data(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
>  		return ret;
>  	}
> 
> +	/* Get VF driver insmod state */
> +	ret = qm_read_regs(vf_qm, QM_VF_STATE, &vf_data->vf_qm_state,
> 1);
> +	if (ret) {
> +		dev_err(dev, "failed to read QM_VF_STATE!\n");
> +		return ret;
> +	}
> +
>  	return 0;
>  }
> 
> @@ -499,6 +507,12 @@ static int vf_qm_load_data(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
>  	qm->qp_base = vf_data->qp_base;
>  	qm->qp_num = vf_data->qp_num;
> 
> +	if (!vf_data->eqe_dma || !vf_data->aeqe_dma ||
> +	    !vf_data->sqc_dma || !vf_data->cqc_dma) {
> +		dev_err(dev, "resume dma addr is NULL!\n");
> +		return -EINVAL;
> +	}
> +

So this is to cover the corner case where the Guest has loaded the driver
(QM_READY set) but not configured the DMA address? When this will happen?
I thought we are setting QM_READY in guest after all configurations.

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 0/5] bugfix some driver issues
  2024-12-19  9:17 [PATCH v2 0/5] bugfix some driver issues Longfang Liu
                   ` (4 preceding siblings ...)
  2024-12-19  9:18 ` [PATCH v2 5/5] hisi_acc_vfio_pci: bugfix live migration function without VF device driver Longfang Liu
@ 2024-12-19 10:23 ` Shameerali Kolothum Thodi
  5 siblings, 0 replies; 21+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-12-19 10:23 UTC (permalink / raw)
  To: liulongfang, alex.williamson@redhat.com, jgg@nvidia.com,
	Jonathan Cameron
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org

Hi Longfang,

> -----Original Message-----
> From: liulongfang <liulongfang@huawei.com>
> Sent: Thursday, December 19, 2024 9:18 AM
> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
> Subject: [PATCH v2 0/5] bugfix some driver issues
> 
> As the test scenarios for the live migration function become
> more and more extensive. Some previously undiscovered driver
> issues were found.
> Update and fix through this patchset.
> 

A changes history will be beneficial to understand what changed from v1.

Also I have a request.... As part of this series, could you please include the
below fix to this driver as well as we do have the same potential problem.

https://lore.kernel.org/all/20241021123843.42979-1-giovanni.cabiddu@intel.com/

Thanks,
Shameer

> Longfang Liu (5):
>   hisi_acc_vfio_pci: fix XQE dma address error
>   hisi_acc_vfio_pci: add eq and aeq interruption restore
>   hisi_acc_vfio_pci: bugfix cache write-back issue
>   hisi_acc_vfio_pci: bugfix the problem of uninstalling driver
>   hisi_acc_vfio_pci: bugfix live migration function without VF device
>     driver
> 
>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 83 ++++++++++++++++---
>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 +-
>  2 files changed, 78 insertions(+), 14 deletions(-)
> 
> --
> 2.24.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/5] hisi_acc_vfio_pci: bugfix live migration function without VF device driver
  2024-12-19 10:02   ` Shameerali Kolothum Thodi
@ 2024-12-27  7:20     ` liulongfang
  0 siblings, 0 replies; 21+ messages in thread
From: liulongfang @ 2024-12-27  7:20 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, alex.williamson@redhat.com,
	jgg@nvidia.com, Jonathan Cameron
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org

On 2024/12/19 18:02, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: liulongfang <liulongfang@huawei.com>
>> Sent: Thursday, December 19, 2024 9:18 AM
>> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
>> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>
>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
>> Subject: [PATCH v2 5/5] hisi_acc_vfio_pci: bugfix live migration function
>> without VF device driver
>>
>> If the driver of the VF device is not loaded in the Guest OS,
>> then perform device data migration. The migrated data address will
>> be NULL.
>> The live migration recovery operation on the destination side will
>> access a null address value, which will cause access errors.
>>
>> Therefore, live migration of VMs without added VF device drivers
>> does not require device data migration.
>> In addition, when the queue address data obtained by the destination
>> is empty, device queue recovery processing will not be performed.
>>
>> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> 
> Why this doesn't need a Fixes tag?
>

If we need to synchronize to the stable branch, we can add the Fixes tag here.

>> ---
>>  drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 18 ++++++++++++++++++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> index 8d9e07ebf4fd..9a5f7e9bc695 100644
>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> @@ -436,6 +436,7 @@ static int vf_qm_get_match_data(struct
>> hisi_acc_vf_core_device *hisi_acc_vdev,
>>  				struct acc_vf_data *vf_data)
>>  {
>>  	struct hisi_qm *pf_qm = hisi_acc_vdev->pf_qm;
>> +	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
>>  	struct device *dev = &pf_qm->pdev->dev;
>>  	int vf_id = hisi_acc_vdev->vf_id;
>>  	int ret;
>> @@ -460,6 +461,13 @@ static int vf_qm_get_match_data(struct
>> hisi_acc_vf_core_device *hisi_acc_vdev,
>>  		return ret;
>>  	}
>>
>> +	/* Get VF driver insmod state */
>> +	ret = qm_read_regs(vf_qm, QM_VF_STATE, &vf_data->vf_qm_state,
>> 1);
>> +	if (ret) {
>> +		dev_err(dev, "failed to read QM_VF_STATE!\n");
>> +		return ret;
>> +	}
>> +
>>  	return 0;
>>  }
>>
>> @@ -499,6 +507,12 @@ static int vf_qm_load_data(struct
>> hisi_acc_vf_core_device *hisi_acc_vdev,
>>  	qm->qp_base = vf_data->qp_base;
>>  	qm->qp_num = vf_data->qp_num;
>>
>> +	if (!vf_data->eqe_dma || !vf_data->aeqe_dma ||
>> +	    !vf_data->sqc_dma || !vf_data->cqc_dma) {
>> +		dev_err(dev, "resume dma addr is NULL!\n");
>> +		return -EINVAL;
>> +	}
>> +
> 
> So this is to cover the corner case where the Guest has loaded the driver
> (QM_READY set) but not configured the DMA address? When this will happen?
> I thought we are setting QM_READY in guest after all configurations.
>

When performing remote live migration, there are abnormal scenarios where the dma
value received through the network is empty.
The driver needs to prevent this null DMA address data from being sent to the hardware.

Thanks.
Longfang.

> Thanks,
> Shameer
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2024-12-19 10:01   ` Shameerali Kolothum Thodi
@ 2024-12-27  7:24     ` liulongfang
  2025-01-02 22:30     ` Alex Williamson
  1 sibling, 0 replies; 21+ messages in thread
From: liulongfang @ 2024-12-27  7:24 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, alex.williamson@redhat.com,
	jgg@nvidia.com, Jonathan Cameron
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org

On 2024/12/19 18:01, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: liulongfang <liulongfang@huawei.com>
>> Sent: Thursday, December 19, 2024 9:18 AM
>> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
>> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>
>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
>> Subject: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
>>
>> The dma addresses of EQE and AEQE are wrong after migration and
>> results in guest kernel-mode encryption services  failure.
>> Comparing the definition of hardware registers, we found that
>> there was an error when the data read from the register was
>> combined into an address. Therefore, the address combination
>> sequence needs to be corrected.
>>
>> Even after fixing the above problem, we still have an issue
>> where the Guest from an old kernel can get migrated to
>> new kernel and may result in wrong data.
>>
>> In order to ensure that the address is correct after migration,
>> if an old magic number is detected, the dma address needs to be
>> updated.
>>
>> Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
>> migration")
>> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
>> ---
>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 34 +++++++++++++++----
>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 ++++-
>>  2 files changed, 36 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> index 451c639299eb..8518efea3a52 100644
>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> @@ -350,6 +350,27 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
>>  	return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
>>  }
>>
>> +static int vf_qm_magic_check(struct acc_vf_data *vf_data)
>> +{
>> +	switch (vf_data->acc_magic) {
>> +	case ACC_DEV_MAGIC_V2:
>> +		break;
>> +	case ACC_DEV_MAGIC_V1:
>> +		/* Correct dma address */
>> +		vf_data->eqe_dma = vf_data-
>>> qm_eqc_dw[QM_XQC_ADDR_HIGH];
>> +		vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
>> +		vf_data->eqe_dma |= vf_data-
>>> qm_eqc_dw[QM_XQC_ADDR_LOW];
>> +		vf_data->aeqe_dma = vf_data-
>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];
>> +		vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
>> +		vf_data->aeqe_dma |= vf_data-
>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int vf_qm_check_match(struct hisi_acc_vf_core_device
>> *hisi_acc_vdev,
>>  			     struct hisi_acc_vf_migration_file *migf)
>>  {
>> @@ -363,7 +384,8 @@ static int vf_qm_check_match(struct
>> hisi_acc_vf_core_device *hisi_acc_vdev,
>>  	if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev-
>>> match_done)
>>  		return 0;
>>
>> -	if (vf_data->acc_magic != ACC_DEV_MAGIC) {
>> +	ret = vf_qm_magic_check(vf_data);
>> +	if (ret) {
>>  		dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
>>  		return -EINVAL;
>>  	}
>> @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct
>> hisi_acc_vf_core_device *hisi_acc_vdev,
>>  	int vf_id = hisi_acc_vdev->vf_id;
>>  	int ret;
>>
>> -	vf_data->acc_magic = ACC_DEV_MAGIC;
>> +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
>>  	/* Save device id */
>>  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
>>
>> @@ -496,12 +518,12 @@ static int vf_qm_read_data(struct hisi_qm
>> *vf_qm, struct acc_vf_data *vf_data)
>>  		return -EINVAL;
>>
>>  	/* Every reg is 32 bit, the dma address is 64 bit. */
>> -	vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
>> +	vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
>>  	vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
>> -	vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
>> -	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
>> +	vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
>> +	vf_data->aeqe_dma = vf_data-
>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];
>>  	vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
>> -	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
>> +	vf_data->aeqe_dma |= vf_data-
>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];
>>
>>  	/* Through SQC_BT/CQC_BT to get sqc and cqc address */
>>  	ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>> index 245d7537b2bc..2afce68f5a34 100644
>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>> @@ -39,6 +39,9 @@
>>  #define QM_REG_ADDR_OFFSET	0x0004
>>
>>  #define QM_XQC_ADDR_OFFSET	32U
>> +#define QM_XQC_ADDR_LOW	0x1
>> +#define QM_XQC_ADDR_HIGH	0x2
>> +
>>  #define QM_VF_AEQ_INT_MASK	0x0004
>>  #define QM_VF_EQ_INT_MASK	0x000c
>>  #define QM_IFC_INT_SOURCE_V	0x0020
>> @@ -50,10 +53,14 @@
>>  #define QM_EQC_DW0		0X8000
>>  #define QM_AEQC_DW0		0X8020
>>
>> +enum acc_magic_num {
>> +	ACC_DEV_MAGIC_V1 = 0XCDCDCDCDFEEDAACC,
>> +	ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECADEDE,
> 
> 
> I think we have discussed this before that having some kind of 
> version info embed into magic_num will be beneficial going 
> forward. ie, may be use the last 4 bytes for denoting version.
> 
> ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECA0002
>

OK, so the subsequent magic number no longer needs a random value,
but a predictable and certain value.

Thanks,
Longfang.

> The reason being, otherwise we have to come up with a random
> magic each time when a fix like this is required in future.
> 
> Thanks,
> Shameer
> 
> .
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2024-12-19 10:01   ` Shameerali Kolothum Thodi
  2024-12-27  7:24     ` liulongfang
@ 2025-01-02 22:30     ` Alex Williamson
  2025-01-13  9:43       ` liulongfang
  1 sibling, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2025-01-02 22:30 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: liulongfang, jgg@nvidia.com, Jonathan Cameron,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org

On Thu, 19 Dec 2024 10:01:03 +0000
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:

> > -----Original Message-----
> > From: liulongfang <liulongfang@huawei.com>
> > Sent: Thursday, December 19, 2024 9:18 AM
> > To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
> > Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>
> > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
> > Subject: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
> > 
> > The dma addresses of EQE and AEQE are wrong after migration and
> > results in guest kernel-mode encryption services  failure.
> > Comparing the definition of hardware registers, we found that
> > there was an error when the data read from the register was
> > combined into an address. Therefore, the address combination
> > sequence needs to be corrected.
> > 
> > Even after fixing the above problem, we still have an issue
> > where the Guest from an old kernel can get migrated to
> > new kernel and may result in wrong data.
> > 
> > In order to ensure that the address is correct after migration,
> > if an old magic number is detected, the dma address needs to be
> > updated.
> > 
> > Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
> > migration")
> > Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> > ---
> >  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 34 +++++++++++++++----
> >  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 ++++-
> >  2 files changed, 36 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > index 451c639299eb..8518efea3a52 100644
> > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > @@ -350,6 +350,27 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
> >  	return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
> >  }
> > 
> > +static int vf_qm_magic_check(struct acc_vf_data *vf_data)
> > +{
> > +	switch (vf_data->acc_magic) {
> > +	case ACC_DEV_MAGIC_V2:
> > +		break;
> > +	case ACC_DEV_MAGIC_V1:
> > +		/* Correct dma address */
> > +		vf_data->eqe_dma = vf_data-  
> > >qm_eqc_dw[QM_XQC_ADDR_HIGH];  
> > +		vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> > +		vf_data->eqe_dma |= vf_data-  
> > >qm_eqc_dw[QM_XQC_ADDR_LOW];  
> > +		vf_data->aeqe_dma = vf_data-  
> > >qm_aeqc_dw[QM_XQC_ADDR_HIGH];  
> > +		vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> > +		vf_data->aeqe_dma |= vf_data-  
> > >qm_aeqc_dw[QM_XQC_ADDR_LOW];  
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int vf_qm_check_match(struct hisi_acc_vf_core_device
> > *hisi_acc_vdev,
> >  			     struct hisi_acc_vf_migration_file *migf)
> >  {
> > @@ -363,7 +384,8 @@ static int vf_qm_check_match(struct
> > hisi_acc_vf_core_device *hisi_acc_vdev,
> >  	if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev-  
> > >match_done)  
> >  		return 0;
> > 
> > -	if (vf_data->acc_magic != ACC_DEV_MAGIC) {
> > +	ret = vf_qm_magic_check(vf_data);
> > +	if (ret) {
> >  		dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
> >  		return -EINVAL;
> >  	}
> > @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct
> > hisi_acc_vf_core_device *hisi_acc_vdev,
> >  	int vf_id = hisi_acc_vdev->vf_id;
> >  	int ret;
> > 
> > -	vf_data->acc_magic = ACC_DEV_MAGIC;
> > +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
> >  	/* Save device id */
> >  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
> > 
> > @@ -496,12 +518,12 @@ static int vf_qm_read_data(struct hisi_qm
> > *vf_qm, struct acc_vf_data *vf_data)
> >  		return -EINVAL;
> > 
> >  	/* Every reg is 32 bit, the dma address is 64 bit. */
> > -	vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
> > +	vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
> >  	vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> > -	vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
> > -	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
> > +	vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
> > +	vf_data->aeqe_dma = vf_data-  
> > >qm_aeqc_dw[QM_XQC_ADDR_HIGH];  
> >  	vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> > -	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
> > +	vf_data->aeqe_dma |= vf_data-  
> > >qm_aeqc_dw[QM_XQC_ADDR_LOW];  
> > 
> >  	/* Through SQC_BT/CQC_BT to get sqc and cqc address */
> >  	ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
> > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> > b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> > index 245d7537b2bc..2afce68f5a34 100644
> > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> > @@ -39,6 +39,9 @@
> >  #define QM_REG_ADDR_OFFSET	0x0004
> > 
> >  #define QM_XQC_ADDR_OFFSET	32U
> > +#define QM_XQC_ADDR_LOW	0x1
> > +#define QM_XQC_ADDR_HIGH	0x2
> > +
> >  #define QM_VF_AEQ_INT_MASK	0x0004
> >  #define QM_VF_EQ_INT_MASK	0x000c
> >  #define QM_IFC_INT_SOURCE_V	0x0020
> > @@ -50,10 +53,14 @@
> >  #define QM_EQC_DW0		0X8000
> >  #define QM_AEQC_DW0		0X8020
> > 
> > +enum acc_magic_num {
> > +	ACC_DEV_MAGIC_V1 = 0XCDCDCDCDFEEDAACC,
> > +	ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECADEDE,  
> 
> 
> I think we have discussed this before that having some kind of 
> version info embed into magic_num will be beneficial going 
> forward. ie, may be use the last 4 bytes for denoting version.
> 
> ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECA0002
> 
> The reason being, otherwise we have to come up with a random
> magic each time when a fix like this is required in future.

Overloading the magic value like this is a bit strange.  Typically
the magic value should be the cookie that identifies the data blob as
generated by this driver and a separate version field would identify
any content or format changes.  In the mtty driver I included a magic
field along with major and minor versions to provide this flexibility.
I wonder why we wouldn't do something similar here rather than create
this combination magic/version field.  It's easy enough to append a
couple fields onto the structure or redefine a v2 structure to use
going forward.  There's even a padding field in the structure already
that could be repurposed.

I see v3 went on to modify the v2 magic as described here, but there's
nothing to suggest the use of these latter bytes as anything other than
a slightly different random magic value.  Minimally the suggestion
should have resulted in 6-bytes of magic and 2-bytes of version (iiuc),
but there's no code to support that nor would I recommend that layout
versus the alternatives above.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2025-01-02 22:30     ` Alex Williamson
@ 2025-01-13  9:43       ` liulongfang
  2025-01-13 13:34         ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: liulongfang @ 2025-01-13  9:43 UTC (permalink / raw)
  To: Alex Williamson, Shameerali Kolothum Thodi
  Cc: jgg@nvidia.com, Jonathan Cameron, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linuxarm@openeuler.org

On 2025/1/3 6:30, Alex Williamson wrote:
> On Thu, 19 Dec 2024 10:01:03 +0000
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> 
>>> -----Original Message-----
>>> From: liulongfang <liulongfang@huawei.com>
>>> Sent: Thursday, December 19, 2024 9:18 AM
>>> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
>>> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
>>> <jonathan.cameron@huawei.com>
>>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>>> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
>>> Subject: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
>>>
>>> The dma addresses of EQE and AEQE are wrong after migration and
>>> results in guest kernel-mode encryption services  failure.
>>> Comparing the definition of hardware registers, we found that
>>> there was an error when the data read from the register was
>>> combined into an address. Therefore, the address combination
>>> sequence needs to be corrected.
>>>
>>> Even after fixing the above problem, we still have an issue
>>> where the Guest from an old kernel can get migrated to
>>> new kernel and may result in wrong data.
>>>
>>> In order to ensure that the address is correct after migration,
>>> if an old magic number is detected, the dma address needs to be
>>> updated.
>>>
>>> Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
>>> migration")
>>> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
>>> ---
>>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 34 +++++++++++++++----
>>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 ++++-
>>>  2 files changed, 36 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>> index 451c639299eb..8518efea3a52 100644
>>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>> @@ -350,6 +350,27 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
>>>  	return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
>>>  }
>>>
>>> +static int vf_qm_magic_check(struct acc_vf_data *vf_data)
>>> +{
>>> +	switch (vf_data->acc_magic) {
>>> +	case ACC_DEV_MAGIC_V2:
>>> +		break;
>>> +	case ACC_DEV_MAGIC_V1:
>>> +		/* Correct dma address */
>>> +		vf_data->eqe_dma = vf_data-  
>>>> qm_eqc_dw[QM_XQC_ADDR_HIGH];  
>>> +		vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
>>> +		vf_data->eqe_dma |= vf_data-  
>>>> qm_eqc_dw[QM_XQC_ADDR_LOW];  
>>> +		vf_data->aeqe_dma = vf_data-  
>>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];  
>>> +		vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
>>> +		vf_data->aeqe_dma |= vf_data-  
>>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];  
>>> +		break;
>>> +	default:
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  static int vf_qm_check_match(struct hisi_acc_vf_core_device
>>> *hisi_acc_vdev,
>>>  			     struct hisi_acc_vf_migration_file *migf)
>>>  {
>>> @@ -363,7 +384,8 @@ static int vf_qm_check_match(struct
>>> hisi_acc_vf_core_device *hisi_acc_vdev,
>>>  	if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev-  
>>>> match_done)  
>>>  		return 0;
>>>
>>> -	if (vf_data->acc_magic != ACC_DEV_MAGIC) {
>>> +	ret = vf_qm_magic_check(vf_data);
>>> +	if (ret) {
>>>  		dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
>>>  		return -EINVAL;
>>>  	}
>>> @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct
>>> hisi_acc_vf_core_device *hisi_acc_vdev,
>>>  	int vf_id = hisi_acc_vdev->vf_id;
>>>  	int ret;
>>>
>>> -	vf_data->acc_magic = ACC_DEV_MAGIC;
>>> +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
>>>  	/* Save device id */
>>>  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
>>>
>>> @@ -496,12 +518,12 @@ static int vf_qm_read_data(struct hisi_qm
>>> *vf_qm, struct acc_vf_data *vf_data)
>>>  		return -EINVAL;
>>>
>>>  	/* Every reg is 32 bit, the dma address is 64 bit. */
>>> -	vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
>>> +	vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
>>>  	vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
>>> -	vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
>>> -	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
>>> +	vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
>>> +	vf_data->aeqe_dma = vf_data-  
>>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];  
>>>  	vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
>>> -	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
>>> +	vf_data->aeqe_dma |= vf_data-  
>>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];  
>>>
>>>  	/* Through SQC_BT/CQC_BT to get sqc and cqc address */
>>>  	ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
>>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>> index 245d7537b2bc..2afce68f5a34 100644
>>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>> @@ -39,6 +39,9 @@
>>>  #define QM_REG_ADDR_OFFSET	0x0004
>>>
>>>  #define QM_XQC_ADDR_OFFSET	32U
>>> +#define QM_XQC_ADDR_LOW	0x1
>>> +#define QM_XQC_ADDR_HIGH	0x2
>>> +
>>>  #define QM_VF_AEQ_INT_MASK	0x0004
>>>  #define QM_VF_EQ_INT_MASK	0x000c
>>>  #define QM_IFC_INT_SOURCE_V	0x0020
>>> @@ -50,10 +53,14 @@
>>>  #define QM_EQC_DW0		0X8000
>>>  #define QM_AEQC_DW0		0X8020
>>>
>>> +enum acc_magic_num {
>>> +	ACC_DEV_MAGIC_V1 = 0XCDCDCDCDFEEDAACC,
>>> +	ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECADEDE,  
>>
>>
>> I think we have discussed this before that having some kind of 
>> version info embed into magic_num will be beneficial going 
>> forward. ie, may be use the last 4 bytes for denoting version.
>>
>> ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECA0002
>>
>> The reason being, otherwise we have to come up with a random
>> magic each time when a fix like this is required in future.
> 
> Overloading the magic value like this is a bit strange.  Typically
> the magic value should be the cookie that identifies the data blob as
> generated by this driver and a separate version field would identify
> any content or format changes.  In the mtty driver I included a magic
> field along with major and minor versions to provide this flexibility.
> I wonder why we wouldn't do something similar here rather than create
> this combination magic/version field.  It's easy enough to append a
> couple fields onto the structure or redefine a v2 structure to use
> going forward.  There's even a padding field in the structure already
> that could be repurposed.
>

If we add the major and minor version numbers like mtty driver, the current
data structure needs to be changed, and this change also needs to be compatible
with the old and new versions.
And the old version does not have this information. The new version cannot
be adapted when it is imported.

Now in this patch to fix this problem, we only need to update a magic number,
and then the problem is corrected. Migration between old and new versions can
also be fixed:

Old --> Old, wrong, but we can't stop it;
Old --> New, corrected migrated incorrect data, recovery successful;
New --> New, correct, recovery successful;
New --> Old, correct, recovery successful.

As for the compatibility issues between old and new versions in the future,
we do not need to reserve version numbers to deal with them now. Because
before encountering specific problems, our design may be redundant.

Thanks.
Longfang.

> I see v3 went on to modify the v2 magic as described here, but there's
> nothing to suggest the use of these latter bytes as anything other than
> a slightly different random magic value.  Minimally the suggestion
> should have resulted in 6-bytes of magic and 2-bytes of version (iiuc),
> but there's no code to support that nor would I recommend that layout
> versus the alternatives above.  Thanks,
> 
> Alex
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2025-01-13  9:43       ` liulongfang
@ 2025-01-13 13:34         ` Alex Williamson
  2025-01-13 19:45           ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2025-01-13 13:34 UTC (permalink / raw)
  To: liulongfang
  Cc: Shameerali Kolothum Thodi, jgg@nvidia.com, Jonathan Cameron,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org

On Mon, 13 Jan 2025 17:43:09 +0800
liulongfang <liulongfang@huawei.com> wrote:

> On 2025/1/3 6:30, Alex Williamson wrote:
> > On Thu, 19 Dec 2024 10:01:03 +0000
> > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> >   
> >>> -----Original Message-----
> >>> From: liulongfang <liulongfang@huawei.com>
> >>> Sent: Thursday, December 19, 2024 9:18 AM
> >>> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
> >>> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> >>> <jonathan.cameron@huawei.com>
> >>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> >>> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
> >>> Subject: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
> >>>
> >>> The dma addresses of EQE and AEQE are wrong after migration and
> >>> results in guest kernel-mode encryption services  failure.
> >>> Comparing the definition of hardware registers, we found that
> >>> there was an error when the data read from the register was
> >>> combined into an address. Therefore, the address combination
> >>> sequence needs to be corrected.
> >>>
> >>> Even after fixing the above problem, we still have an issue
> >>> where the Guest from an old kernel can get migrated to
> >>> new kernel and may result in wrong data.
> >>>
> >>> In order to ensure that the address is correct after migration,
> >>> if an old magic number is detected, the dma address needs to be
> >>> updated.
> >>>
> >>> Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
> >>> migration")
> >>> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> >>> ---
> >>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 34 +++++++++++++++----
> >>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 ++++-
> >>>  2 files changed, 36 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >>> index 451c639299eb..8518efea3a52 100644
> >>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >>> @@ -350,6 +350,27 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
> >>>  	return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
> >>>  }
> >>>
> >>> +static int vf_qm_magic_check(struct acc_vf_data *vf_data)
> >>> +{
> >>> +	switch (vf_data->acc_magic) {
> >>> +	case ACC_DEV_MAGIC_V2:
> >>> +		break;
> >>> +	case ACC_DEV_MAGIC_V1:
> >>> +		/* Correct dma address */
> >>> +		vf_data->eqe_dma = vf_data-    
> >>>> qm_eqc_dw[QM_XQC_ADDR_HIGH];    
> >>> +		vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> >>> +		vf_data->eqe_dma |= vf_data-    
> >>>> qm_eqc_dw[QM_XQC_ADDR_LOW];    
> >>> +		vf_data->aeqe_dma = vf_data-    
> >>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];    
> >>> +		vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> >>> +		vf_data->aeqe_dma |= vf_data-    
> >>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];    
> >>> +		break;
> >>> +	default:
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>  static int vf_qm_check_match(struct hisi_acc_vf_core_device
> >>> *hisi_acc_vdev,
> >>>  			     struct hisi_acc_vf_migration_file *migf)
> >>>  {
> >>> @@ -363,7 +384,8 @@ static int vf_qm_check_match(struct
> >>> hisi_acc_vf_core_device *hisi_acc_vdev,
> >>>  	if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev-    
> >>>> match_done)    
> >>>  		return 0;
> >>>
> >>> -	if (vf_data->acc_magic != ACC_DEV_MAGIC) {
> >>> +	ret = vf_qm_magic_check(vf_data);
> >>> +	if (ret) {
> >>>  		dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
> >>>  		return -EINVAL;
> >>>  	}
> >>> @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct
> >>> hisi_acc_vf_core_device *hisi_acc_vdev,
> >>>  	int vf_id = hisi_acc_vdev->vf_id;
> >>>  	int ret;
> >>>
> >>> -	vf_data->acc_magic = ACC_DEV_MAGIC;
> >>> +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
> >>>  	/* Save device id */
> >>>  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
> >>>
> >>> @@ -496,12 +518,12 @@ static int vf_qm_read_data(struct hisi_qm
> >>> *vf_qm, struct acc_vf_data *vf_data)
> >>>  		return -EINVAL;
> >>>
> >>>  	/* Every reg is 32 bit, the dma address is 64 bit. */
> >>> -	vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
> >>> +	vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
> >>>  	vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> >>> -	vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
> >>> -	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
> >>> +	vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
> >>> +	vf_data->aeqe_dma = vf_data-    
> >>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];    
> >>>  	vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> >>> -	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
> >>> +	vf_data->aeqe_dma |= vf_data-    
> >>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];    
> >>>
> >>>  	/* Through SQC_BT/CQC_BT to get sqc and cqc address */
> >>>  	ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
> >>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >>> index 245d7537b2bc..2afce68f5a34 100644
> >>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >>> @@ -39,6 +39,9 @@
> >>>  #define QM_REG_ADDR_OFFSET	0x0004
> >>>
> >>>  #define QM_XQC_ADDR_OFFSET	32U
> >>> +#define QM_XQC_ADDR_LOW	0x1
> >>> +#define QM_XQC_ADDR_HIGH	0x2
> >>> +
> >>>  #define QM_VF_AEQ_INT_MASK	0x0004
> >>>  #define QM_VF_EQ_INT_MASK	0x000c
> >>>  #define QM_IFC_INT_SOURCE_V	0x0020
> >>> @@ -50,10 +53,14 @@
> >>>  #define QM_EQC_DW0		0X8000
> >>>  #define QM_AEQC_DW0		0X8020
> >>>
> >>> +enum acc_magic_num {
> >>> +	ACC_DEV_MAGIC_V1 = 0XCDCDCDCDFEEDAACC,
> >>> +	ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECADEDE,    
> >>
> >>
> >> I think we have discussed this before that having some kind of 
> >> version info embed into magic_num will be beneficial going 
> >> forward. ie, may be use the last 4 bytes for denoting version.
> >>
> >> ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECA0002
> >>
> >> The reason being, otherwise we have to come up with a random
> >> magic each time when a fix like this is required in future.  
> > 
> > Overloading the magic value like this is a bit strange.  Typically
> > the magic value should be the cookie that identifies the data blob as
> > generated by this driver and a separate version field would identify
> > any content or format changes.  In the mtty driver I included a magic
> > field along with major and minor versions to provide this flexibility.
> > I wonder why we wouldn't do something similar here rather than create
> > this combination magic/version field.  It's easy enough to append a
> > couple fields onto the structure or redefine a v2 structure to use
> > going forward.  There's even a padding field in the structure already
> > that could be repurposed.
> >  
> 
> If we add the major and minor version numbers like mtty driver, the current
> data structure needs to be changed, and this change also needs to be compatible
> with the old and new versions.
> And the old version does not have this information. The new version cannot
> be adapted when it is imported.

Are you suggesting that we cannot use a different data structure based
on the magic value?  That's rather the point of the magic value.

> Now in this patch to fix this problem, we only need to update a magic number,
> and then the problem is corrected. Migration between old and new versions can
> also be fixed:
> 
> Old --> Old, wrong, but we can't stop it;
> Old --> New, corrected migrated incorrect data, recovery successful;
> New --> New, correct, recovery successful;
> New --> Old, correct, recovery successful.
> 
> As for the compatibility issues between old and new versions in the future,
> we do not need to reserve version numbers to deal with them now. Because
> before encountering specific problems, our design may be redundant.

A magic value + version number would prevent the need to replace the
magic value every time an issue is encountered, which I think was also
Shameer's commit, which is not addressed by forcing the formatting of a
portion of the magic value.  None of what you're trying to do here
precludes a new data structure for the new magic value or defining the
padding fields for different use cases.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2025-01-13 13:34         ` Alex Williamson
@ 2025-01-13 19:45           ` Alex Williamson
  2025-01-14  3:17             ` liulongfang
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2025-01-13 19:45 UTC (permalink / raw)
  To: liulongfang
  Cc: Shameerali Kolothum Thodi, jgg@nvidia.com, Jonathan Cameron,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org

On Mon, 13 Jan 2025 08:34:41 -0500
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Mon, 13 Jan 2025 17:43:09 +0800
> liulongfang <liulongfang@huawei.com> wrote:
> 
> > On 2025/1/3 6:30, Alex Williamson wrote:  
> > > On Thu, 19 Dec 2024 10:01:03 +0000
> > > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> > >     
> > >>> -----Original Message-----
> > >>> From: liulongfang <liulongfang@huawei.com>
> > >>> Sent: Thursday, December 19, 2024 9:18 AM
> > >>> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
> > >>> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> > >>> <jonathan.cameron@huawei.com>
> > >>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > >>> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
> > >>> Subject: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
> > >>>
> > >>> The dma addresses of EQE and AEQE are wrong after migration and
> > >>> results in guest kernel-mode encryption services  failure.
> > >>> Comparing the definition of hardware registers, we found that
> > >>> there was an error when the data read from the register was
> > >>> combined into an address. Therefore, the address combination
> > >>> sequence needs to be corrected.
> > >>>
> > >>> Even after fixing the above problem, we still have an issue
> > >>> where the Guest from an old kernel can get migrated to
> > >>> new kernel and may result in wrong data.
> > >>>
> > >>> In order to ensure that the address is correct after migration,
> > >>> if an old magic number is detected, the dma address needs to be
> > >>> updated.
> > >>>
> > >>> Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
> > >>> migration")
> > >>> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> > >>> ---
> > >>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 34 +++++++++++++++----
> > >>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 ++++-
> > >>>  2 files changed, 36 insertions(+), 7 deletions(-)
> > >>>
> > >>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > >>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > >>> index 451c639299eb..8518efea3a52 100644
> > >>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > >>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > >>> @@ -350,6 +350,27 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
> > >>>  	return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
> > >>>  }
> > >>>
> > >>> +static int vf_qm_magic_check(struct acc_vf_data *vf_data)
> > >>> +{
> > >>> +	switch (vf_data->acc_magic) {
> > >>> +	case ACC_DEV_MAGIC_V2:
> > >>> +		break;
> > >>> +	case ACC_DEV_MAGIC_V1:
> > >>> +		/* Correct dma address */
> > >>> +		vf_data->eqe_dma = vf_data-      
> > >>>> qm_eqc_dw[QM_XQC_ADDR_HIGH];      
> > >>> +		vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> > >>> +		vf_data->eqe_dma |= vf_data-      
> > >>>> qm_eqc_dw[QM_XQC_ADDR_LOW];      
> > >>> +		vf_data->aeqe_dma = vf_data-      
> > >>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];      
> > >>> +		vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> > >>> +		vf_data->aeqe_dma |= vf_data-      
> > >>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];      
> > >>> +		break;
> > >>> +	default:
> > >>> +		return -EINVAL;
> > >>> +	}
> > >>> +
> > >>> +	return 0;
> > >>> +}
> > >>> +
> > >>>  static int vf_qm_check_match(struct hisi_acc_vf_core_device
> > >>> *hisi_acc_vdev,
> > >>>  			     struct hisi_acc_vf_migration_file *migf)
> > >>>  {
> > >>> @@ -363,7 +384,8 @@ static int vf_qm_check_match(struct
> > >>> hisi_acc_vf_core_device *hisi_acc_vdev,
> > >>>  	if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev-      
> > >>>> match_done)      
> > >>>  		return 0;
> > >>>
> > >>> -	if (vf_data->acc_magic != ACC_DEV_MAGIC) {
> > >>> +	ret = vf_qm_magic_check(vf_data);
> > >>> +	if (ret) {
> > >>>  		dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
> > >>>  		return -EINVAL;
> > >>>  	}
> > >>> @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct
> > >>> hisi_acc_vf_core_device *hisi_acc_vdev,
> > >>>  	int vf_id = hisi_acc_vdev->vf_id;
> > >>>  	int ret;
> > >>>
> > >>> -	vf_data->acc_magic = ACC_DEV_MAGIC;
> > >>> +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
> > >>>  	/* Save device id */
> > >>>  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
> > >>>
> > >>> @@ -496,12 +518,12 @@ static int vf_qm_read_data(struct hisi_qm
> > >>> *vf_qm, struct acc_vf_data *vf_data)
> > >>>  		return -EINVAL;
> > >>>
> > >>>  	/* Every reg is 32 bit, the dma address is 64 bit. */
> > >>> -	vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
> > >>> +	vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
> > >>>  	vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> > >>> -	vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
> > >>> -	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
> > >>> +	vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
> > >>> +	vf_data->aeqe_dma = vf_data-      
> > >>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];      
> > >>>  	vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> > >>> -	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
> > >>> +	vf_data->aeqe_dma |= vf_data-      
> > >>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];      
> > >>>
> > >>>  	/* Through SQC_BT/CQC_BT to get sqc and cqc address */
> > >>>  	ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
> > >>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> > >>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> > >>> index 245d7537b2bc..2afce68f5a34 100644
> > >>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> > >>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> > >>> @@ -39,6 +39,9 @@
> > >>>  #define QM_REG_ADDR_OFFSET	0x0004
> > >>>
> > >>>  #define QM_XQC_ADDR_OFFSET	32U
> > >>> +#define QM_XQC_ADDR_LOW	0x1
> > >>> +#define QM_XQC_ADDR_HIGH	0x2
> > >>> +
> > >>>  #define QM_VF_AEQ_INT_MASK	0x0004
> > >>>  #define QM_VF_EQ_INT_MASK	0x000c
> > >>>  #define QM_IFC_INT_SOURCE_V	0x0020
> > >>> @@ -50,10 +53,14 @@
> > >>>  #define QM_EQC_DW0		0X8000
> > >>>  #define QM_AEQC_DW0		0X8020
> > >>>
> > >>> +enum acc_magic_num {
> > >>> +	ACC_DEV_MAGIC_V1 = 0XCDCDCDCDFEEDAACC,
> > >>> +	ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECADEDE,      
> > >>
> > >>
> > >> I think we have discussed this before that having some kind of 
> > >> version info embed into magic_num will be beneficial going 
> > >> forward. ie, may be use the last 4 bytes for denoting version.
> > >>
> > >> ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECA0002
> > >>
> > >> The reason being, otherwise we have to come up with a random
> > >> magic each time when a fix like this is required in future.    
> > > 
> > > Overloading the magic value like this is a bit strange.  Typically
> > > the magic value should be the cookie that identifies the data blob as
> > > generated by this driver and a separate version field would identify
> > > any content or format changes.  In the mtty driver I included a magic
> > > field along with major and minor versions to provide this flexibility.
> > > I wonder why we wouldn't do something similar here rather than create
> > > this combination magic/version field.  It's easy enough to append a
> > > couple fields onto the structure or redefine a v2 structure to use
> > > going forward.  There's even a padding field in the structure already
> > > that could be repurposed.
> > >    
> > 
> > If we add the major and minor version numbers like mtty driver, the current
> > data structure needs to be changed, and this change also needs to be compatible
> > with the old and new versions.
> > And the old version does not have this information. The new version cannot
> > be adapted when it is imported.  
> 
> Are you suggesting that we cannot use a different data structure based
> on the magic value?  That's rather the point of the magic value.
> 
> > Now in this patch to fix this problem, we only need to update a magic number,
> > and then the problem is corrected. Migration between old and new versions can
> > also be fixed:
> > 
> > Old --> Old, wrong, but we can't stop it;
> > Old --> New, corrected migrated incorrect data, recovery successful;
> > New --> New, correct, recovery successful;
> > New --> Old, correct, recovery successful.

BTW, New->Old is not possible given the current old code and this
series doesn't enable it.  New magic is saved, old restore could should
reject it:

static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
                             struct hisi_acc_vf_migration_file *migf)
{       
...
        if (vf_data->acc_magic != ACC_DEV_MAGIC) { 
                dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
                return -EINVAL; 
        }

From this patch:

@@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 	int vf_id = hisi_acc_vdev->vf_id;
 	int ret;
 
-	vf_data->acc_magic = ACC_DEV_MAGIC;
+	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
 	/* Save device id */
 	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
 
Thanks,
Alex

> > 
> > As for the compatibility issues between old and new versions in the future,
> > we do not need to reserve version numbers to deal with them now. Because
> > before encountering specific problems, our design may be redundant.  
> 
> A magic value + version number would prevent the need to replace the
> magic value every time an issue is encountered, which I think was also
> Shameer's commit, which is not addressed by forcing the formatting of a
> portion of the magic value.  None of what you're trying to do here
> precludes a new data structure for the new magic value or defining the
> padding fields for different use cases.  Thanks,
> 
> Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2025-01-13 19:45           ` Alex Williamson
@ 2025-01-14  3:17             ` liulongfang
  2025-01-14  8:07               ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 21+ messages in thread
From: liulongfang @ 2025-01-14  3:17 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Shameerali Kolothum Thodi, jgg@nvidia.com, Jonathan Cameron,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org

On 2025/1/14 3:45, Alex Williamson wrote:
> On Mon, 13 Jan 2025 08:34:41 -0500
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
>> On Mon, 13 Jan 2025 17:43:09 +0800
>> liulongfang <liulongfang@huawei.com> wrote:
>>
>>> On 2025/1/3 6:30, Alex Williamson wrote:  
>>>> On Thu, 19 Dec 2024 10:01:03 +0000
>>>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
>>>>     
>>>>>> -----Original Message-----
>>>>>> From: liulongfang <liulongfang@huawei.com>
>>>>>> Sent: Thursday, December 19, 2024 9:18 AM
>>>>>> To: alex.williamson@redhat.com; jgg@nvidia.com; Shameerali Kolothum
>>>>>> Thodi <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
>>>>>> <jonathan.cameron@huawei.com>
>>>>>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>>>>>> linuxarm@openeuler.org; liulongfang <liulongfang@huawei.com>
>>>>>> Subject: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
>>>>>>
>>>>>> The dma addresses of EQE and AEQE are wrong after migration and
>>>>>> results in guest kernel-mode encryption services  failure.
>>>>>> Comparing the definition of hardware registers, we found that
>>>>>> there was an error when the data read from the register was
>>>>>> combined into an address. Therefore, the address combination
>>>>>> sequence needs to be corrected.
>>>>>>
>>>>>> Even after fixing the above problem, we still have an issue
>>>>>> where the Guest from an old kernel can get migrated to
>>>>>> new kernel and may result in wrong data.
>>>>>>
>>>>>> In order to ensure that the address is correct after migration,
>>>>>> if an old magic number is detected, the dma address needs to be
>>>>>> updated.
>>>>>>
>>>>>> Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live
>>>>>> migration")
>>>>>> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
>>>>>> ---
>>>>>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 34 +++++++++++++++----
>>>>>>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  9 ++++-
>>>>>>  2 files changed, 36 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>>>>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>>>>> index 451c639299eb..8518efea3a52 100644
>>>>>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>>>>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>>>>> @@ -350,6 +350,27 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
>>>>>>  	return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
>>>>>>  }
>>>>>>
>>>>>> +static int vf_qm_magic_check(struct acc_vf_data *vf_data)
>>>>>> +{
>>>>>> +	switch (vf_data->acc_magic) {
>>>>>> +	case ACC_DEV_MAGIC_V2:
>>>>>> +		break;
>>>>>> +	case ACC_DEV_MAGIC_V1:
>>>>>> +		/* Correct dma address */
>>>>>> +		vf_data->eqe_dma = vf_data-      
>>>>>>> qm_eqc_dw[QM_XQC_ADDR_HIGH];      
>>>>>> +		vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
>>>>>> +		vf_data->eqe_dma |= vf_data-      
>>>>>>> qm_eqc_dw[QM_XQC_ADDR_LOW];      
>>>>>> +		vf_data->aeqe_dma = vf_data-      
>>>>>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];      
>>>>>> +		vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
>>>>>> +		vf_data->aeqe_dma |= vf_data-      
>>>>>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];      
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		return -EINVAL;
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>>  static int vf_qm_check_match(struct hisi_acc_vf_core_device
>>>>>> *hisi_acc_vdev,
>>>>>>  			     struct hisi_acc_vf_migration_file *migf)
>>>>>>  {
>>>>>> @@ -363,7 +384,8 @@ static int vf_qm_check_match(struct
>>>>>> hisi_acc_vf_core_device *hisi_acc_vdev,
>>>>>>  	if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev-      
>>>>>>> match_done)      
>>>>>>  		return 0;
>>>>>>
>>>>>> -	if (vf_data->acc_magic != ACC_DEV_MAGIC) {
>>>>>> +	ret = vf_qm_magic_check(vf_data);
>>>>>> +	if (ret) {
>>>>>>  		dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
>>>>>>  		return -EINVAL;
>>>>>>  	}
>>>>>> @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct
>>>>>> hisi_acc_vf_core_device *hisi_acc_vdev,
>>>>>>  	int vf_id = hisi_acc_vdev->vf_id;
>>>>>>  	int ret;
>>>>>>
>>>>>> -	vf_data->acc_magic = ACC_DEV_MAGIC;
>>>>>> +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
>>>>>>  	/* Save device id */
>>>>>>  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
>>>>>>
>>>>>> @@ -496,12 +518,12 @@ static int vf_qm_read_data(struct hisi_qm
>>>>>> *vf_qm, struct acc_vf_data *vf_data)
>>>>>>  		return -EINVAL;
>>>>>>
>>>>>>  	/* Every reg is 32 bit, the dma address is 64 bit. */
>>>>>> -	vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
>>>>>> +	vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
>>>>>>  	vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
>>>>>> -	vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
>>>>>> -	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
>>>>>> +	vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
>>>>>> +	vf_data->aeqe_dma = vf_data-      
>>>>>>> qm_aeqc_dw[QM_XQC_ADDR_HIGH];      
>>>>>>  	vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
>>>>>> -	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
>>>>>> +	vf_data->aeqe_dma |= vf_data-      
>>>>>>> qm_aeqc_dw[QM_XQC_ADDR_LOW];      
>>>>>>
>>>>>>  	/* Through SQC_BT/CQC_BT to get sqc and cqc address */
>>>>>>  	ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
>>>>>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>>>>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>>>>> index 245d7537b2bc..2afce68f5a34 100644
>>>>>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>>>>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>>>>> @@ -39,6 +39,9 @@
>>>>>>  #define QM_REG_ADDR_OFFSET	0x0004
>>>>>>
>>>>>>  #define QM_XQC_ADDR_OFFSET	32U
>>>>>> +#define QM_XQC_ADDR_LOW	0x1
>>>>>> +#define QM_XQC_ADDR_HIGH	0x2
>>>>>> +
>>>>>>  #define QM_VF_AEQ_INT_MASK	0x0004
>>>>>>  #define QM_VF_EQ_INT_MASK	0x000c
>>>>>>  #define QM_IFC_INT_SOURCE_V	0x0020
>>>>>> @@ -50,10 +53,14 @@
>>>>>>  #define QM_EQC_DW0		0X8000
>>>>>>  #define QM_AEQC_DW0		0X8020
>>>>>>
>>>>>> +enum acc_magic_num {
>>>>>> +	ACC_DEV_MAGIC_V1 = 0XCDCDCDCDFEEDAACC,
>>>>>> +	ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECADEDE,      
>>>>>
>>>>>
>>>>> I think we have discussed this before that having some kind of 
>>>>> version info embed into magic_num will be beneficial going 
>>>>> forward. ie, may be use the last 4 bytes for denoting version.
>>>>>
>>>>> ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECA0002
>>>>>
>>>>> The reason being, otherwise we have to come up with a random
>>>>> magic each time when a fix like this is required in future.    
>>>>
>>>> Overloading the magic value like this is a bit strange.  Typically
>>>> the magic value should be the cookie that identifies the data blob as
>>>> generated by this driver and a separate version field would identify
>>>> any content or format changes.  In the mtty driver I included a magic
>>>> field along with major and minor versions to provide this flexibility.
>>>> I wonder why we wouldn't do something similar here rather than create
>>>> this combination magic/version field.  It's easy enough to append a
>>>> couple fields onto the structure or redefine a v2 structure to use
>>>> going forward.  There's even a padding field in the structure already
>>>> that could be repurposed.
>>>>    
>>>
>>> If we add the major and minor version numbers like mtty driver, the current
>>> data structure needs to be changed, and this change also needs to be compatible
>>> with the old and new versions.
>>> And the old version does not have this information. The new version cannot
>>> be adapted when it is imported.  
>>
>> Are you suggesting that we cannot use a different data structure based
>> on the magic value?  That's rather the point of the magic value.
>>
>>> Now in this patch to fix this problem, we only need to update a magic number,
>>> and then the problem is corrected. Migration between old and new versions can
>>> also be fixed:
>>>
>>> Old --> Old, wrong, but we can't stop it;
>>> Old --> New, corrected migrated incorrect data, recovery successful;
>>> New --> New, correct, recovery successful;
>>> New --> Old, correct, recovery successful.
> 
> BTW, New->Old is not possible given the current old code and this
> series doesn't enable it.  New magic is saved, old restore could should
> reject it:
> 
> static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
>                              struct hisi_acc_vf_migration_file *migf)
> {       
> ...
>         if (vf_data->acc_magic != ACC_DEV_MAGIC) { 
>                 dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
>                 return -EINVAL; 
>         }
> 
>>From this patch:

Yes, New->Old will be rejected when matching like this.

> 
> @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
>  	int vf_id = hisi_acc_vdev->vf_id;
>  	int ret;
>  
> -	vf_data->acc_magic = ACC_DEV_MAGIC;
> +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
>  	/* Save device id */
>  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
>  
> Thanks,
> Alex
> 
>>>
>>> As for the compatibility issues between old and new versions in the future,
>>> we do not need to reserve version numbers to deal with them now. Because
>>> before encountering specific problems, our design may be redundant.  
>>
>> A magic value + version number would prevent the need to replace the
>> magic value every time an issue is encountered, which I think was also
>> Shameer's commit, which is not addressed by forcing the formatting of a
>> portion of the magic value.  None of what you're trying to do here
>> precludes a new data structure for the new magic value or defining the
>> padding fields for different use cases.  Thanks,
>>
>> Alex
> 

If we want to use the original magic number, we also need to add the major
and minor version numbers. It does not cause compatibility issues and can
only reuse the original u64 memory space.

This is also Shameerali's previous plan. Do you accept this plan?

Thanks.
Longfang.

> 
> .
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2025-01-14  3:17             ` liulongfang
@ 2025-01-14  8:07               ` Shameerali Kolothum Thodi
  2025-01-14 13:54                 ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: Shameerali Kolothum Thodi @ 2025-01-14  8:07 UTC (permalink / raw)
  To: liulongfang, Alex Williamson
  Cc: jgg@nvidia.com, Jonathan Cameron, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linuxarm@openeuler.org



> -----Original Message-----
> From: liulongfang <liulongfang@huawei.com>
> Sent: Tuesday, January 14, 2025 3:18 AM
> To: Alex Williamson <alex.williamson@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; jgg@nvidia.com; Jonathan
> Cameron <jonathan.cameron@huawei.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; linuxarm@openeuler.org
> Subject: Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
> 

[...]

> > @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
> >  	int vf_id = hisi_acc_vdev->vf_id;
> >  	int ret;
> >
> > -	vf_data->acc_magic = ACC_DEV_MAGIC;
> > +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
> >  	/* Save device id */
> >  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
> >
> > Thanks,
> > Alex
> >
> >>>
> >>> As for the compatibility issues between old and new versions in the
> >>> future, we do not need to reserve version numbers to deal with them
> >>> now. Because before encountering specific problems, our design may
> be redundant.
> >>
> >> A magic value + version number would prevent the need to replace the
> >> magic value every time an issue is encountered, which I think was
> >> also Shameer's commit, which is not addressed by forcing the
> >> formatting of a portion of the magic value.  None of what you're
> >> trying to do here precludes a new data structure for the new magic
> >> value or defining the padding fields for different use cases.
> >> Thanks,
> >>
> >> Alex
> >
> 
> If we want to use the original magic number, we also need to add the major
> and minor version numbers. It does not cause compatibility issues and can
> only reuse the original u64 memory space.
> 
> This is also Shameerali's previous plan. Do you accept this plan?

The suggestion here is to improve my previous plan.. ie, instead of overloading
the V2 MAGIC with version info, introduce version(major:minor) separately.

Something like,

Rename old MAGIC as MAGIC_V1

Introduce new MAGIC as MAGIC_V2

Repurpose any padding or reserved fields in struct vf_data for major:minor
version  fields. The idea of introducing these major:minor is for future use
where instead of coming up with a  new MAGIC data every time we can
increment the version for bug fixes and features if required. 

Hope this is clear now.

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
  2025-01-14  8:07               ` Shameerali Kolothum Thodi
@ 2025-01-14 13:54                 ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2025-01-14 13:54 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: liulongfang, jgg@nvidia.com, Jonathan Cameron,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org

On Tue, 14 Jan 2025 08:07:43 +0000
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:

> > -----Original Message-----
> > From: liulongfang <liulongfang@huawei.com>
> > Sent: Tuesday, January 14, 2025 3:18 AM
> > To: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; jgg@nvidia.com; Jonathan
> > Cameron <jonathan.cameron@huawei.com>; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; linuxarm@openeuler.org
> > Subject: Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
> >   
> 
> [...]
> 
> > > @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct  
> > hisi_acc_vf_core_device *hisi_acc_vdev,  
> > >  	int vf_id = hisi_acc_vdev->vf_id;
> > >  	int ret;
> > >
> > > -	vf_data->acc_magic = ACC_DEV_MAGIC;
> > > +	vf_data->acc_magic = ACC_DEV_MAGIC_V2;
> > >  	/* Save device id */
> > >  	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
> > >
> > > Thanks,
> > > Alex
> > >  
> > >>>
> > >>> As for the compatibility issues between old and new versions in the
> > >>> future, we do not need to reserve version numbers to deal with them
> > >>> now. Because before encountering specific problems, our design may  
> > be redundant.  
> > >>
> > >> A magic value + version number would prevent the need to replace the
> > >> magic value every time an issue is encountered, which I think was
> > >> also Shameer's commit, which is not addressed by forcing the
> > >> formatting of a portion of the magic value.  None of what you're
> > >> trying to do here precludes a new data structure for the new magic
> > >> value or defining the padding fields for different use cases.
> > >> Thanks,
> > >>
> > >> Alex  
> > >  
> > 
> > If we want to use the original magic number, we also need to add the major
> > and minor version numbers. It does not cause compatibility issues and can
> > only reuse the original u64 memory space.
> > 
> > This is also Shameerali's previous plan. Do you accept this plan?  
> 
> The suggestion here is to improve my previous plan.. ie, instead of overloading
> the V2 MAGIC with version info, introduce version(major:minor) separately.
> 
> Something like,
> 
> Rename old MAGIC as MAGIC_V1
> 
> Introduce new MAGIC as MAGIC_V2
> 
> Repurpose any padding or reserved fields in struct vf_data for major:minor
> version  fields. The idea of introducing these major:minor is for future use
> where instead of coming up with a  new MAGIC data every time we can
> increment the version for bug fixes and features if required. 

Yes, there's no suggestion to use the original magic value, that magic
number describes a non-extensible data structure.  As Shameer
indicates, use a new magic value and define within the data structure
specified by that magic value major:minor number and a scheme for
defining compatible (if any) and incompatible updates.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-01-14 13:55 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-19  9:17 [PATCH v2 0/5] bugfix some driver issues Longfang Liu
2024-12-19  9:17 ` [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error Longfang Liu
2024-12-19 10:01   ` Shameerali Kolothum Thodi
2024-12-27  7:24     ` liulongfang
2025-01-02 22:30     ` Alex Williamson
2025-01-13  9:43       ` liulongfang
2025-01-13 13:34         ` Alex Williamson
2025-01-13 19:45           ` Alex Williamson
2025-01-14  3:17             ` liulongfang
2025-01-14  8:07               ` Shameerali Kolothum Thodi
2025-01-14 13:54                 ` Alex Williamson
2024-12-19  9:17 ` [PATCH v2 2/5] hisi_acc_vfio_pci: add eq and aeq interruption restore Longfang Liu
2024-12-19 10:01   ` Shameerali Kolothum Thodi
2024-12-19  9:17 ` [PATCH v2 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue Longfang Liu
2024-12-19 10:01   ` Shameerali Kolothum Thodi
2024-12-19  9:17 ` [PATCH v2 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling driver Longfang Liu
2024-12-19 10:01   ` Shameerali Kolothum Thodi
2024-12-19  9:18 ` [PATCH v2 5/5] hisi_acc_vfio_pci: bugfix live migration function without VF device driver Longfang Liu
2024-12-19 10:02   ` Shameerali Kolothum Thodi
2024-12-27  7:20     ` liulongfang
2024-12-19 10:23 ` [PATCH v2 0/5] bugfix some driver issues Shameerali Kolothum Thodi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox