From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
To: Alex Williamson <alex.williamson@redhat.com>,
liulongfang <liulongfang@huawei.com>
Cc: "jgg@nvidia.com" <jgg@nvidia.com>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linuxarm@openeuler.org" <linuxarm@openeuler.org>
Subject: RE: [PATCH v4 1/5] hisi_acc_vfio_pci: fix XQE dma address error
Date: Wed, 26 Feb 2025 08:11:17 +0000 [thread overview]
Message-ID: <024fd8e2334141b688150650728699ba@huawei.com> (raw)
In-Reply-To: <20250225170941.46b0ede5.alex.williamson@redhat.com>
> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, February 26, 2025 12:10 AM
> To: liulongfang <liulongfang@huawei.com>
> Cc: jgg@nvidia.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; linuxarm@openeuler.org
> Subject: Re: [PATCH v4 1/5] hisi_acc_vfio_pci: fix XQE dma address error
>
> On Tue, 25 Feb 2025 14:27:53 +0800
> Longfang Liu <liulongfang@huawei.com> wrote:
>
> > The dma addresses of EQE and AEQE are wrong after migration and
> > results in guest kernel-mode encryption services failure.
> > Comparing the definition of hardware registers, we found that
> > there was an error when the data read from the register was
> > combined into an address. Therefore, the address combination
> > sequence needs to be corrected.
> >
> > Even after fixing the above problem, we still have an issue
> > where the Guest from an old kernel can get migrated to
> > new kernel and may result in wrong data.
> >
> > In order to ensure that the address is correct after migration,
> > if an old magic number is detected, the dma address needs to be
> > updated.
> >
> > Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live
> migration")
> > Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> > ---
> > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 40 ++++++++++++++++---
> > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 14 ++++++-
> > 2 files changed, 46 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > index 451c639299eb..35316984089b 100644
> > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > @@ -350,6 +350,31 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
> > return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
> > }
> >
> > +static int vf_qm_version_check(struct acc_vf_data *vf_data, struct device
> *dev)
> > +{
> > + switch (vf_data->acc_magic) {
> > + case ACC_DEV_MAGIC_V2:
> > + if (vf_data->major_ver < ACC_DRV_MAJOR_VER ||
> > + vf_data->minor_ver < ACC_DRV_MINOR_VER)
> > + dev_info(dev, "migration driver version not
> match!\n");
> > + return -EINVAL;
> > + break;
>
> What's your major/minor update strategy?
>
> Note that minor_ver is a u16 and ACC_DRV_MINOR_VER is defined as 0, so
> testing less than 0 against an unsigned is useless.
>
> Arguably testing major and minor independently is pretty useless too.
>
> You're defining what a future "old" driver version will accept, which
> is very nearly anything, so any breaking change *again* requires a new
> magic, so we're accomplishing very little here.
>
> Maybe you want to consider a strategy where you'd increment the major
> number for a breaking change and minor for a compatible feature. In
> that case you'd want to verify the major_ver matches
> ACC_DRV_MAJOR_VER
> exactly and minor_ver would be more of a feature level.
Agree. I think the above check should be just major_ver != ACC_DRV_MAJOR_VER
and we can make use of minor version whenever we need a specific handling for
a feature support.
Also I think it would be good to print the version info above in case of mismatch.
>
> > + case ACC_DEV_MAGIC_V1:
> > + /* Correct dma address */
> > + vf_data->eqe_dma = vf_data-
> >qm_eqc_dw[QM_XQC_ADDR_HIGH];
> > + vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> > + vf_data->eqe_dma |= vf_data-
> >qm_eqc_dw[QM_XQC_ADDR_LOW];
> > + vf_data->aeqe_dma = vf_data-
> >qm_aeqc_dw[QM_XQC_ADDR_HIGH];
> > + vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> > + vf_data->aeqe_dma |= vf_data-
> >qm_aeqc_dw[QM_XQC_ADDR_LOW];
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int vf_qm_check_match(struct hisi_acc_vf_core_device
> *hisi_acc_vdev,
> > struct hisi_acc_vf_migration_file *migf)
> > {
> > @@ -363,7 +388,8 @@ static int vf_qm_check_match(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
> > if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev-
> >match_done)
> > return 0;
> >
> > - if (vf_data->acc_magic != ACC_DEV_MAGIC) {
> > + ret = vf_qm_version_check(vf_data, dev);
> > + if (ret) {
> > dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
> > return -EINVAL;
> > }
> > @@ -418,7 +444,9 @@ static int vf_qm_get_match_data(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
> > int vf_id = hisi_acc_vdev->vf_id;
> > int ret;
> >
> > - vf_data->acc_magic = ACC_DEV_MAGIC;
> > + vf_data->acc_magic = ACC_DEV_MAGIC_V2;
> > + vf_data->major_ver = ACC_DRV_MAR;
> > + vf_data->minor_ver = ACC_DRV_MIN;
>
> Where are "MAR" and "MIN" defined? I can't see how this would even
> compile. Thanks,
Yes. Please make sure to do a compile test and run basic sanity tested even if you
think the changes are minor. Chances are there that you will miss out things like
this.
Thanks,
Shameer
next prev parent reply other threads:[~2025-02-26 8:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-25 6:27 [PATCH v4 0/5] bugfix some driver issues Longfang Liu
2025-02-25 6:27 ` [PATCH v4 1/5] hisi_acc_vfio_pci: fix XQE dma address error Longfang Liu
2025-02-26 0:09 ` Alex Williamson
2025-02-26 8:11 ` Shameerali Kolothum Thodi [this message]
2025-02-26 11:41 ` liulongfang
2025-02-26 11:21 ` liulongfang
2025-02-27 18:00 ` kernel test robot
2025-02-28 11:55 ` kernel test robot
2025-03-03 11:14 ` liulongfang
2025-02-25 6:27 ` [PATCH v4 2/5] hisi_acc_vfio_pci: add eq and aeq interruption restore Longfang Liu
2025-02-25 6:27 ` [PATCH v4 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue Longfang Liu
2025-02-25 6:27 ` [PATCH v4 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling driver Longfang Liu
2025-02-25 6:27 ` [PATCH v4 5/5] hisi_acc_vfio_pci: bugfix live migration function without VF device driver Longfang Liu
2025-02-26 9:26 ` Shameerali Kolothum Thodi
2025-02-26 11:53 ` liulongfang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=024fd8e2334141b688150650728699ba@huawei.com \
--to=shameerali.kolothum.thodi@huawei.com \
--cc=alex.williamson@redhat.com \
--cc=jgg@nvidia.com \
--cc=jonathan.cameron@huawei.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@openeuler.org \
--cc=liulongfang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox