Re: MMIO/PIO dispatch file descriptors (ioregionfd) design discussion

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Peter Xu <peterx@redhat.com>,
	Elena Afanasova <eafanasova@gmail.com>,
	kvm@vger.kernel.org, john.g.johnson@oracle.com,
	dinechin@redhat.com, cohuck@redhat.com, jasowang@redhat.com,
	felipe@nutanix.com, elena.ufimtseva@oracle.com,
	jag.raman@oracle.com
Subject: Re: MMIO/PIO dispatch file descriptors (ioregionfd) design discussion
Date: Thu, 3 Dec 2020 06:34:00 -0500	[thread overview]
Message-ID: <20201203062357-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20201203111036.GD689053@stefanha-x1.localdomain>

On Thu, Dec 03, 2020 at 11:10:36AM +0000, Stefan Hajnoczi wrote:
> On Wed, Dec 02, 2020 at 01:06:28PM -0500, Peter Xu wrote:
> > On Wed, Nov 25, 2020 at 12:44:07PM -0800, Elena Afanasova wrote:
> > 
> > [...]
> > 
> > > Wire protocol
> > > -------------
> > > The protocol spoken over the file descriptor is as follows. The device reads
> > > commands from the file descriptor with the following layout::
> > > 
> > >   struct ioregionfd_cmd {
> > >       __u32 info;
> > >       __u32 padding;
> > >       __u64 user_data;
> > >       __u64 offset;
> > >       __u64 data;
> > >   };
> > 
> > I'm thinking whether it would be nice to have a handshake on the wire protocol
> > before starting the cmd/resp sequence.
> > 
> > I was thinking about migration - we have had a hard time trying to be
> > compatible between old/new qemus.  Now we fixed those by applying the same
> > migration capabilities on both sides always so we do the handshake "manually"
> > from libvirt, but it really should be done with a real handshake on the
> > channel, imho..  That's another story, for sure.
> > 
> > My understanding is that the wire protocol is kind of a standalone (but tiny)
> > protocol between kvm and the emulation process.  So I'm thinking the handshake
> > could also help when e.g. kvm can fallback to an old version of wire protocol
> > if it knows the emulation binary is old.  Ideally, I think this could even
> > happen without VMM's awareness.
> > 
> > [...]
> 
> I imagined that would happen in the control plane (KVM ioctls) instead
> of the data plane (the fd). There is a flags field in
> ioctl(KVM_SET_IOREGION):
> 
>   struct kvm_ioregion {
>       __u64 guest_paddr; /* guest physical address */
>       __u64 memory_size; /* bytes */
>       __u64 user_data;
>       __s32 fd; /* previously created with KVM_CREATE_IOREGIONFD */
>       __u32 flags;
>       __u8  pad[32];
>   };
> 
> When userspace sets up the ioregionfd it can tell the kernel which
> features to enable.
> 
> Feature availability can be checked through ioctl(KVM_CHECK_EXTENSION).
> 
> Do you think this existing mechanism is enough? It's not clear to me
> what kind of additional negotiation would be necessary between the
> device emulation process and KVM after the ioregionfd has been
> registered?
> 
> > > Ordering
> > > --------
> > > Guest accesses are delivered in order, including posted writes.
> > 
> > I'm wondering whether it should prepare for out-of-order commands assuming if
> > there's no handshake so harder to extend, just in case there could be some slow
> > commands so we still have chance to reply to a very trivial command during
> > handling the slow one (then each command may require a command ID, too).  But
> > it won't be a problem at all if we can easily extend the wire protocol so the
> > ordering constraint can be extended too when really needed, and we can always
> > start with in-order-only requests.
> 
> Elena and I brainstormed out-of-order commands but didn't include them
> in the proposal because it's not clear that they are needed. For
> multi-queue devices the per-queue registers can be assigned different
> ioregionfds that are handled by dedicated threads.

The difficulty is I think the reverse: reading
any register from a PCI device is normally enough to flush any
writes and interrupts in progress.



> Out-of-order commands are only necessary if a device needs to
> concurrently process register accesses to the *same* set of registers. I
> think it's rare for hardware register interfaces to be designed like
> that.
> 
> This could be a mistake, of course. If someone knows a device that needs
> multiple in-flight register accesses, please let us know.
> 
> Stefan

next prev parent reply	other threads:[~2020-12-03 11:35 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-25 20:44 MMIO/PIO dispatch file descriptors (ioregionfd) design discussion Elena Afanasova
2020-12-02 18:06 ` Peter Xu
2020-12-03 11:10   ` Stefan Hajnoczi
2020-12-03 11:34     ` Michael S. Tsirkin [this message]
2020-12-04 13:23       ` Stefan Hajnoczi
2020-12-03 14:40     ` Peter Xu
2020-12-07 14:58       ` Stefan Hajnoczi
2021-10-12  5:34 ` elena
2021-10-12  5:34   ` elena
2021-10-25 12:42   ` Stefan Hajnoczi
2021-10-25 12:42     ` Stefan Hajnoczi
2021-10-25 15:21     ` Elena
2021-10-25 15:21       ` Elena
2021-10-25 16:56       ` Stefan Hajnoczi
2021-10-25 16:56         ` Stefan Hajnoczi
2021-10-26 19:01       ` John Levon
2021-10-26 19:01         ` John Levon
2021-10-27 10:15         ` Stefan Hajnoczi
2021-10-27 10:15           ` Stefan Hajnoczi
2021-10-27 12:22           ` John Levon
2021-10-27 12:22             ` John Levon
2021-10-28  8:14             ` Stefan Hajnoczi
2021-10-28  8:14               ` Stefan Hajnoczi
     [not found] <CAFO2pHzmVf7g3z0RikQbYnejwcWRtHKV=npALs49eRDJdt4mJQ@mail.gmail.com>
2020-11-26  3:37 ` Jason Wang
2020-11-26 12:36   ` Stefan Hajnoczi
2020-11-27  3:39     ` Jason Wang
2020-11-27 13:44       ` Stefan Hajnoczi
2020-11-30  2:14         ` Jason Wang
2020-11-30 12:47           ` Stefan Hajnoczi
2020-12-01  4:05             ` Jason Wang
2020-12-01 10:35               ` Stefan Hajnoczi
2020-12-02  2:53                 ` Jason Wang
2020-12-02 14:17                 ` Elena Afanasova

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201203062357-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=dinechin@redhat.com \
    --cc=eafanasova@gmail.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=felipe@nutanix.com \
    --cc=jag.raman@oracle.com \
    --cc=jasowang@redhat.com \
    --cc=john.g.johnson@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.