From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"corbet@lwn.net" <corbet@lwn.net>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"farman@linux.ibm.com" <farman@linux.ibm.com>,
"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
"Lu, Baolu" <baolu.lu@intel.com>
Subject: Re: [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state
Date: Mon, 10 Jan 2022 13:52:59 -0400 [thread overview]
Message-ID: <20220110175259.GG2328285@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB52766CFF8183099A16DFEFC68C509@BN9PR11MB5276.namprd11.prod.outlook.com>
On Mon, Jan 10, 2022 at 03:14:44AM +0000, Tian, Kevin wrote:
> > An operator might need to emergency migrate a VM without the
> > possibility for failure. For instance there is something wrong with
> > the base HW. SLA ignored, migration must be done.
>
> How is it done today when no assigned device supports migration?
That is different, the operator deliberately created a VM that is not
migratable. Operators may simply prefer to never do this.
You are talking about migration which is blockable by the guest -
outside of operator controll this is a totally different thing.
> - It's necessary to support existing HW though it may only supports
> optional migration due to unbounded time of stopping DMA;
At a minimum a device with optional migration needs to be reported to
userspace and qemu should not blindly adopt it without some opt-in
IMHO.
> - We should influence IP designers to design HW to allow preempting
> in-fly requests and stop DMA quickly (also implying the capability of
> aborting/resuming in-fly PRI requests);
Yes, I think we need a way to suspend the device then abort its PRIs
with some error. The ATS cache is not something that is migrated so
this seems reasonable.
The only sketchy bit looks like how to resync the VM that still would
have a PRI in its queue and would still want to answer it. That answer
would have to be discarded..
> - Specific to the device state management uAPI, it should not assume
> a specific usage and instead allow the user to set a timeout value so
> transitioning to NDMA is failed if the operation cannot be completed
> within the specified timeout value. If the user doesn't set it, the
> migration driver could conservatively use a default timeout value to
> gate any potentially unbounded operation.
This would need to go along with the flag above, as only optional
drivers should have something like this.
Jason
next prev parent reply other threads:[~2022-01-10 17:53 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-09 23:34 [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state Alex Williamson
2021-12-10 1:25 ` Jason Gunthorpe
2021-12-13 20:40 ` Alex Williamson
2021-12-14 12:08 ` Cornelia Huck
2021-12-14 16:26 ` Jason Gunthorpe
2021-12-20 22:26 ` Alex Williamson
2022-01-04 20:28 ` Jason Gunthorpe
2022-01-06 18:17 ` Alex Williamson
2022-01-06 21:20 ` Jason Gunthorpe
2022-01-10 7:55 ` Tian, Kevin
2022-01-10 17:34 ` Alex Williamson
2022-01-11 2:41 ` Tian, Kevin
2022-01-10 18:11 ` Jason Gunthorpe
2022-01-11 3:14 ` Tian, Kevin
2022-01-11 18:19 ` Jason Gunthorpe
2022-01-04 3:49 ` Tian, Kevin
2022-01-04 16:09 ` Jason Gunthorpe
2022-01-05 1:59 ` Tian, Kevin
2022-01-05 12:45 ` Jason Gunthorpe
2022-01-06 6:32 ` Tian, Kevin
2022-01-06 15:42 ` Jason Gunthorpe
2022-01-07 0:00 ` Tian, Kevin
2022-01-07 0:29 ` Jason Gunthorpe
2022-01-07 2:01 ` Tian, Kevin
2022-01-07 17:23 ` Jason Gunthorpe
2022-01-10 3:14 ` Tian, Kevin
2022-01-10 17:52 ` Jason Gunthorpe [this message]
2022-01-11 2:57 ` Tian, Kevin
2022-01-05 3:06 ` Tian, Kevin
2021-12-20 17:38 ` Cornelia Huck
2021-12-20 22:49 ` Alex Williamson
2021-12-21 11:24 ` Cornelia Huck
2022-01-07 8:03 ` Tian, Kevin
2022-01-07 16:36 ` Alex Williamson
2022-01-10 6:01 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220110175259.GG2328285@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@intel.com \
--cc=cohuck@redhat.com \
--cc=corbet@lwn.net \
--cc=farman@linux.ibm.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=pasic@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).