kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"farman@linux.ibm.com" <farman@linux.ibm.com>,
	"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	Yishai Hadas <yishaih@nvidia.com>
Subject: Re: [PATCH RFC] vfio: Revise and update the migration uAPI description
Date: Tue, 25 Jan 2022 21:32:58 -0400	[thread overview]
Message-ID: <20220126013258.GN84788@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB5276AFC1BDE4B4D9634947C28C209@BN9PR11MB5276.namprd11.prod.outlook.com>

On Wed, Jan 26, 2022 at 01:17:26AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, January 25, 2022 9:12 PM
> > 
> > On Tue, Jan 25, 2022 at 03:55:31AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Saturday, January 15, 2022 3:35 AM
> > > > + *
> > > > + *   The peer to peer (P2P) quiescent state is intended to be a quiescent
> > > > + *   state for the device for the purposes of managing multiple devices
> > > > within
> > > > + *   a user context where peer-to-peer DMA between devices may be
> > active.
> > > > The
> > > > + *   PRE_COPY_P2P and RUNNING_P2P states must prevent the device
> > from
> > > > + *   initiating any new P2P DMA transactions. If the device can identify
> > P2P
> > > > + *   transactions then it can stop only P2P DMA, otherwise it must stop
> > all
> > > > + *   DMA.  The migration driver must complete any such outstanding
> > > > operations
> > > > + *   prior to completing the FSM arc into either P2P state.
> > > > + *
> > >
> > > Now NDMA is renamed to P2P... but we did discuss the potential
> > > usage of using this state on devices which cannot stop DMA quickly
> > > thus needs to drain pending page requests which further requires
> > > running vCPUs if the fault is on guest I/O page table.
> > 
> > I think this needs to be fleshed out more before we can add it,
> > ideally along with a driver and some qemu implementation
> 
> Yes. We have internal implementation but it has to be cleaned up
> based on this new proposal.
> 
> > 
> > It looks like the qemu part for this will not be so easy..
> > 
> 
> My point is that we know that usage in the radar (though it needs more
> discussion with real example) then does it make sense to make the 
> current name more general? I'm not sure how many devices can figure
> out P2P from normal DMAs. If most devices have to stop all DMAs to
> meet the requirement, calling it a name about stopping all DMAs doesn't
> hurt the current P2P requirement and is more extensible to cover other
> stop-dma requirements.

Except you are not talking about stopping all DMAs, you are talking
about a state that might hang indefinately waiting for a vPRI to
complete

In my mind this is completely different, and may motivate another
state in the graph

  PRE_COPY -> PRE_COPY_STOP_PRI -> PRE_COPY_STOP_P2P -> STOP_COPY

As STOP_PRI can be defined as halting any new PRIs and always return
immediately.

STOP_P2P can hang if PRI's are open

This affords a pretty clean approach for userspace to conclude the
open PRIs or decide it has to give up the migration.

Theoretical future devices that can support aborting PRI would not use
this state and would have STOP_P2P as also being NO_PRI. On this
device userspace would somehow abort the PRIs when it reaches
STOP_COPY.

Or at least that is one possibility.

In any event, the v2 is built as Alex and Cornelia were suggesting
with a minimal base feature set and two optional extensions for P2P
and PRE_COPY. Adding a 3rd extension for vPRI is completely
reasonable.

Further, from what I can understand devices doing PRI are incompatible
with the base feature set anyhow, as they can not support a RUNNING ->
STOP_COPY transition without, minimally, completing all the open
vPRIs. As VMMs implementing the base protocol should stop the vCPU and
then move the device to STOP_COPY, it is inherently incompatible with
what you are proposing.

The new vPRI enabled protocol would have to superceed the base
protocol and eliminate implicit transitions through the VPRI
maintenance states as these are non-transparent.

It is all stuff we can do in the FSM model, but it all needs a careful
think and a FSM design.

(there is also the interesting question how to even detect this as
vPRI special cases should only even exist if the device was bound to a
PRI capable io page table, so a single device may or may not use this
depending, and at least right now things are assuming these flags are
static at device setup time, so hurm)

Jason

  reply	other threads:[~2022-01-26  1:33 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14 19:35 [PATCH RFC] vfio: Revise and update the migration uAPI description Jason Gunthorpe
2022-01-18 14:04 ` Yishai Hadas
2022-01-18 19:55 ` Alex Williamson
2022-01-18 21:00   ` Jason Gunthorpe
2022-01-19 11:40     ` Cornelia Huck
2022-01-19 12:44       ` Jason Gunthorpe
2022-01-19 13:42         ` Jason Gunthorpe
2022-01-19 14:59     ` Jason Gunthorpe
2022-01-19 15:32     ` Alex Williamson
2022-01-19 15:40       ` Jason Gunthorpe
2022-01-19 16:06         ` Alex Williamson
2022-01-19 16:38           ` Jason Gunthorpe
2022-01-19 17:02             ` Alex Williamson
2022-01-20  0:19               ` Jason Gunthorpe
2022-01-24 10:24                 ` Cornelia Huck
2022-01-24 17:57                   ` Jason Gunthorpe
2022-01-19 13:18   ` Jason Gunthorpe
2022-01-25  3:55 ` Tian, Kevin
2022-01-25 13:11   ` Jason Gunthorpe
2022-01-26  1:17     ` Tian, Kevin
2022-01-26  1:32       ` Jason Gunthorpe [this message]
2022-01-26  1:49         ` Tian, Kevin
2022-01-26 12:14           ` Jason Gunthorpe
2022-01-26 15:33             ` Jason Gunthorpe
2022-01-27  0:38               ` Tian, Kevin
2022-01-27  0:48                 ` Jason Gunthorpe
2022-01-27  1:03                   ` Tian, Kevin
2022-01-27  0:53             ` Tian, Kevin
2022-01-27  1:10               ` Jason Gunthorpe
2022-01-27  1:21                 ` Tian, Kevin
2022-01-26  1:35       ` Jason Gunthorpe
2022-01-26  1:58         ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220126013258.GN84788@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).