From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Brett Creeley <brett.creeley@amd.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
"yishaih@nvidia.com" <yishaih@nvidia.com>,
"shameerali.kolothum.thodi@huawei.com"
<shameerali.kolothum.thodi@huawei.com>,
"shannon.nelson@amd.com" <shannon.nelson@amd.com>
Subject: Re: [PATCH v10 vfio 4/7] vfio/pds: Add VFIO live migration support
Date: Mon, 26 Jun 2023 15:13:53 -0300 [thread overview]
Message-ID: <ZJnVYczb9M/wugO8@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB52762ECFCA869B97BDD2AA9D8C26A@BN9PR11MB5276.namprd11.prod.outlook.com>
On Mon, Jun 26, 2023 at 07:31:31AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, June 21, 2023 9:27 PM
> >
> > On Wed, Jun 21, 2023 at 06:49:12AM +0000, Tian, Kevin wrote:
> >
> > > What is the criteria for 'reasonable'? How does CSPs judge that such
> > > device can guarantee a *reliable* reasonable window so live migration
> > > can be enabled in the production environment?
> >
> > The CSP needs to work with the device vendor to understand how it fits
> > into their system, I don't see how we can externalize this kind of
> > detail in a general way.
> >
> > > I'm afraid that we are hiding a non-deterministic factor in current protocol.
> >
> > Yes
> >
> > > But still I don't think it's a good situation where the user has ZERO
> > > knowledge about the non-negligible time in the stopping path...
> >
> > In any sane device design this will be a small period of time. These
> > timeouts should be to protect against a device that has gone wild.
> >
>
> Any example how 'small' it will be (e.g. <1ms)?
Not personally..
> Should we define a *reasonable* threshold in VFIO community which
> any new variant driver should provide information to judge against?
Ah, I think we are just too new to get into such details. I think we
need some real world experience to see if this is really an issue.
> The reason why I keep discussing it is that IMHO achieving negligible
> stop time is a very challenging task for many accelerators. e.g. IDXD
> can be stopped only after completing all the pending requests. While
> it allows software to configure the max pending work size (and a
> reasonable setting could meet both migration SLA and performance
> SLA) the worst-case draining latency could be in 10's milliseconds which
> cannot be ignored by the VMM.
Well, what would you report here if you had the opportunity to report
something? Some big number? Then what?
> Or do you think it's still better left to CSP working with the device vendor
> even in this case, given the worst-case latency could be affected by
> many factors hence not something which a kernel driver can accurately
> estimate?
This is my fear, that it is so complicated that reducing it to any
sort of cross-vendor data is not feasible. At least I'd like to see
someone experiment with what information would be useful to qemu
before we add kernel ABI..
Jason
next prev parent reply other threads:[~2023-06-26 18:14 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-02 22:03 [PATCH v10 vfio 0/7] pds_vfio driver Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 1/7] vfio: Commonize combine_ranges for use in other VFIO drivers Brett Creeley
2023-06-16 6:52 ` Tian, Kevin
2023-06-16 18:37 ` Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 2/7] vfio/pds: Initial support for pds_vfio VFIO driver Brett Creeley
2023-06-14 21:31 ` Alex Williamson
2023-06-14 21:41 ` Brett Creeley
2023-06-16 6:56 ` Tian, Kevin
2023-06-16 18:42 ` Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 3/7] vfio/pds: register with the pds_core PF Brett Creeley
2023-06-15 21:05 ` Shameerali Kolothum Thodi
2023-06-15 21:30 ` Brett Creeley
2023-06-16 7:04 ` Tian, Kevin
2023-06-16 19:01 ` Brett Creeley
2023-06-20 2:11 ` Tian, Kevin
2023-06-02 22:03 ` [PATCH v10 vfio 4/7] vfio/pds: Add VFIO live migration support Brett Creeley
2023-06-15 21:07 ` Shameerali Kolothum Thodi
2023-06-15 21:36 ` Brett Creeley
2023-06-16 8:06 ` Tian, Kevin
2023-06-17 4:45 ` Brett Creeley
2023-06-20 2:19 ` Tian, Kevin
2023-06-19 12:46 ` Jason Gunthorpe
2023-06-20 2:02 ` Tian, Kevin
2023-06-20 12:31 ` Jason Gunthorpe
2023-06-21 6:49 ` Tian, Kevin
2023-06-21 13:27 ` Jason Gunthorpe
2023-06-26 7:31 ` Tian, Kevin
2023-06-26 18:13 ` Jason Gunthorpe [this message]
2023-06-27 6:03 ` Tian, Kevin
2023-06-02 22:03 ` [PATCH v10 vfio 5/7] vfio/pds: Add support for dirty page tracking Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 6/7] vfio/pds: Add support for firmware recovery Brett Creeley
2023-06-16 8:24 ` Tian, Kevin
2023-06-17 0:47 ` Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 7/7] vfio/pds: Add Kconfig and documentation Brett Creeley
2023-06-16 8:25 ` Tian, Kevin
2023-06-16 20:05 ` Brett Creeley
2023-06-14 20:20 ` [PATCH v10 vfio 0/7] pds_vfio driver Alex Williamson
2023-06-16 6:47 ` Tian, Kevin
2023-06-16 20:06 ` Brett Creeley
2023-06-17 4:49 ` Brett Creeley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZJnVYczb9M/wugO8@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=brett.creeley@amd.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=shannon.nelson@amd.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox