From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
Alex Williamson <alex.williamson@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"mgurtovoy@nvidia.com" <mgurtovoy@nvidia.com>,
"yishaih@nvidia.com" <yishaih@nvidia.com>,
Linuxarm <linuxarm@huawei.com>,
liulongfang <liulongfang@huawei.com>,
"Zengtao (B)" <prime.zeng@hisilicon.com>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
"Wangzhou (B)" <wangzhou1@hisilicon.com>
Subject: RE: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
Date: Wed, 2 Mar 2022 09:07:38 +0000 [thread overview]
Message-ID: <635f11c40e814d749ccf533f1414ba4e@huawei.com> (raw)
In-Reply-To: <20220302000329.GZ219866@nvidia.com>
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 02 March 2022 00:03
> To: Alex Williamson <alex.williamson@redhat.com>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-crypto@vger.kernel.org; cohuck@redhat.com; mgurtovoy@nvidia.com;
> yishaih@nvidia.com; Linuxarm <linuxarm@huawei.com>; liulongfang
> <liulongfang@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>
> Subject: Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> migration
>
> On Tue, Mar 01, 2022 at 03:44:31PM -0700, Alex Williamson wrote:
> > On Tue, 1 Mar 2022 16:39:38 -0400
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > On Tue, Mar 01, 2022 at 12:30:47PM -0700, Alex Williamson wrote:
> > > > Wouldn't it make more sense if initial-bytes started at QM_MATCH_SIZE
> > > > and dirty-bytes was always sizeof(vf_data) - QM_MATCH_SIZE? ie.
> QEMU
> > > > would know that it has sizeof(vf_data) - QM_MATCH_SIZE remaining even
> > > > while it's getting ENOMSG after reading QM_MATCH_SIZE bytes of data.
> > >
> > > The purpose of this ioctl is to help userspace guess when moving on to
> > > STOP_COPY is a good idea ie when the device has done almost all the
> > > work it is going to be able to do in PRE_COPY. ENOMSG is a similar
> > > indicator.
> > >
> > > I expect all devices to have some additional STOP_COPY trailer_data in
> > > addition to their PRE_COPY initial_data and dirty_data
> > >
> > > There is a choice to make if we report the trailer_data during
> > > PRE_COPY or not. As this is all estimates, it doesn't matter unless
> > > the trailer_data is very big.
> > >
> > > Having all devices trend toward a 0 dirty_bytes to say they are are
> > > done all the pre-copy they can do makes sense from an API
> > > perspective. If one device trends toward 10MB due to a big
> > > trailer_data and one trends toward 0 bytes, how will qemu consistently
> > > decide when best to trigger STOP_COPY? It makes the API less useful.
> > >
> > > So, I would not include trailer_data in the dirty_bytes.
> >
> > That assumes that it's possible to keep up with the device dirty
> > rate.
>
> It keeps options open so we have this choice someday.
>
> We already see that implementations are using vCPU throttling as part
> of their migration strategy, and we are seriously looking at DMA
> throttling. It is not a big leap to imagine that
> internal-state-dirtying throttling will happne someday.
>
> With throttling iterations would ratchet up the throttle until they
> reach an absolute small amount of dirty then cut over to STOP_COPY
>
> > It seems like a better approach for userspace would be to look at how
> > dirty_bytes is trending.
>
> It may be biw, but this approach doesn't care if the trailing_bytes
> are included or not, so lets leave them out and preserve the other
> operating model.
>
> > If we exclude STOP_COPY trailing data from the VFIO_DEVICE_MIG_PRECOPY
> > ioctl, it seems even more of a disconnect that when we enter the
> > STOP_COPY state, suddenly we start getting new data out of a PRECOPY
> > ioctl.
>
> Why? That amounts can go up at any time, how does it matter if it goes
> up after STOP_COPY or instantly before?
>
> > BTW, "VFIO_DEVICE" should be reserved for ioctls and data structures
> > relative to the device FD, appending it with _MIG is too subtle for me.
> > This is also a GET operation for INFO, so I'd think for consistency
> > with the existing vfio uAPI we'd name this something like
> > VFIO_MIG_GET_PRECOPY_INFO where the structure might be named
> > vfio_precopy_info.
>
> Sure
>
> > So if we don't think this is the right approach for STOP_COPY, then why
> > are we pushing that it has any purpose outside of PRECOPY or might be
> > implemented by a non-PRECOPY driver for use in STOP_COPY?
>
> It is just simpler and more consistent to implement the math under
> this ioctl in all cases then to try and artificially restrict it.
>
> But I don't have a use case for it, so lets block it if you prefer.
>
> Shameerali will you make these adjustments to the PRE_COPY patch?
Sure. I think we can summarize the discussion as below,
- Rename the MIG_PRECOPY ioctl to VFIO_MIG_GET_PRECOPY_INFO and
structure to vfio_precopy_info.
- This ioctl is only valid in PRE_COPY state and should return -EINVAL in
other states(Update the documentation).
- No changes to the initial_bytes & dirty_bytes descriptions.
Please let me know if I missed anything.
I will address other comments on this series as well and sent out a
revised one soon.
Thanks,
Shameer
next prev parent reply other threads:[~2022-03-02 9:07 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-28 9:01 [PATCH v6 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
2022-02-28 9:01 ` [PATCH v6 01/10] crypto: hisilicon/qm: Move the QM header to include/linux Shameer Kolothum
2022-02-28 9:01 ` [PATCH v6 02/10] crypto: hisilicon/qm: Move few definitions to common header Shameer Kolothum
2022-02-28 9:01 ` [PATCH v6 03/10] hisi_acc_qm: Move PCI device IDs " Shameer Kolothum
2022-02-28 17:33 ` Alex Williamson
2022-02-28 20:12 ` Bjorn Helgaas
2022-02-28 20:23 ` Alex Williamson
2022-02-28 20:55 ` Bjorn Helgaas
2022-02-28 9:01 ` [PATCH v6 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices Shameer Kolothum
2022-02-28 9:01 ` [PATCH v6 05/10] hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region Shameer Kolothum
2022-02-28 9:01 ` [PATCH v6 06/10] hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver Shameer Kolothum
2022-02-28 9:01 ` [PATCH v6 07/10] vfio: Extend the device migration protocol with PRE_COPY Shameer Kolothum
2022-02-28 9:01 ` [PATCH v6 08/10] crypto: hisilicon/qm: Set the VF QM state register Shameer Kolothum
2022-02-28 9:01 ` [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration Shameer Kolothum
2022-02-28 14:57 ` Jason Gunthorpe
2022-02-28 18:01 ` Shameerali Kolothum Thodi
2022-02-28 18:05 ` Jason Gunthorpe
2022-02-28 20:16 ` Alex Williamson
2022-02-28 20:29 ` Jason Gunthorpe
2022-02-28 21:20 ` Alex Williamson
2022-02-28 23:47 ` Jason Gunthorpe
2022-03-01 4:41 ` Alex Williamson
2022-03-01 13:15 ` Jason Gunthorpe
2022-03-01 19:30 ` Alex Williamson
2022-03-01 20:39 ` Jason Gunthorpe
2022-03-01 22:44 ` Alex Williamson
2022-03-02 0:03 ` Jason Gunthorpe
2022-03-02 9:07 ` Shameerali Kolothum Thodi [this message]
2022-02-28 9:01 ` [PATCH v6 10/10] hisi_acc_vfio_pci: Use its own PCI reset_done error handler Shameer Kolothum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=635f11c40e814d749ccf533f1414ba4e@huawei.com \
--to=shameerali.kolothum.thodi@huawei.com \
--cc=alex.williamson@redhat.com \
--cc=cohuck@redhat.com \
--cc=jgg@nvidia.com \
--cc=jonathan.cameron@huawei.com \
--cc=kvm@vger.kernel.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=liulongfang@huawei.com \
--cc=mgurtovoy@nvidia.com \
--cc=prime.zeng@hisilicon.com \
--cc=wangzhou1@hisilicon.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).