From: Stefan Hajnoczi <stefanha@gmail.com>
To: Jason Wang <jasowang@redhat.com>
Cc: "Elena Ufimtseva" <elena.ufimtseva@oracle.com>,
"Janosch Frank" <frankja@linux.vnet.ibm.com>,
"mst@redhat.com" <mtsirkin@redhat.com>,
"John G Johnson" <john.g.johnson@oracle.com>,
qemu-devel <qemu-devel@nongnu.org>,
"Kirti Wankhede" <kwankhede@nvidia.com>,
"Gerd Hoffmann" <kraxel@redhat.com>,
"Yan Vugenfirer" <yan@daynix.com>,
"Jag Raman" <jag.raman@oracle.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Anup Patel" <anup@brainfault.org>,
"Claudio Imbrenda" <imbrenda@linux.vnet.ibm.com>,
"Christian Borntraeger" <borntraeger@de.ibm.com>,
"Roman Kagan" <rkagan@virtuozzo.com>,
"Felipe Franciosi" <felipe@nutanix.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Jens Freimann" <jfreimann@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@redhat.com>,
"Stefano Garzarella" <sgarzare@redhat.com>,
"Eduardo Habkost" <ehabkost@redhat.com>,
"Sergio Lopez" <slp@redhat.com>,
"Kashyap Chamarthy" <kchamart@redhat.com>,
"Darren Kenny" <darren.kenny@oracle.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Liran Alon" <liran.alon@oracle.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Thanos Makatos" <thanos.makatos@nutanix.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"David Gibson" <david@gibson.dropbear.id.au>,
"Kevin Wolf" <kwolf@redhat.com>,
"Halil Pasic" <pasic@linux.vnet.ibm.com>,
"Daniel P. Berrange" <berrange@redhat.com>,
"Christophe de Dinechin" <dinechin@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>, fam <fam@euphon.net>
Subject: Re: Out-of-Process Device Emulation session at KVM Forum 2020
Date: Fri, 30 Oct 2020 13:15:04 +0000 [thread overview]
Message-ID: <CAJSP0QX_=dbDB2k7H-6D19ns1_HuM2P5ZMtUrFN9H7WU8aDXCg@mail.gmail.com> (raw)
In-Reply-To: <95432b0c-919f-3868-b3f5-fc45a1eef721@redhat.com>
On Fri, Oct 30, 2020 at 12:08 PM Jason Wang <jasowang@redhat.com> wrote:
> On 2020/10/30 下午7:13, Stefan Hajnoczi wrote:
> > On Fri, Oct 30, 2020 at 9:46 AM Jason Wang <jasowang@redhat.com> wrote:
> >> On 2020/10/30 下午2:21, Stefan Hajnoczi wrote:
> >>> On Fri, Oct 30, 2020 at 3:04 AM Alex Williamson
> >>> <alex.williamson@redhat.com> wrote:
> >>>> It's great to revisit ideas, but proclaiming a uAPI is bad solely
> >>>> because the data transfer is opaque, without defining why that's bad,
> >>>> evaluating the feasibility and implementation of defining a well
> >>>> specified data format rather than protocol, including cross-vendor
> >>>> support, or proposing any sort of alternative is not so helpful imo.
> >>> The migration approaches in VFIO and vDPA/vhost were designed for
> >>> different requirements and I think this is why there are different
> >>> perspectives on this. Here is a comparison and how VFIO could be
> >>> extended in the future. I see 3 levels of device state compatibility:
> >>>
> >>> 1. The device cannot save/load state blobs, instead userspace fetches
> >>> and restores specific values of the device's runtime state (e.g. last
> >>> processed ring index). This is the vhost approach.
> >>>
> >>> 2. The device can save/load state in a standard format. This is
> >>> similar to #1 except that there is a single read/write blob interface
> >>> instead of fine-grained get_FOO()/set_FOO() interfaces. This approach
> >>> pushes the migration state parsing into the device so that userspace
> >>> doesn't need knowledge of every device type. With this approach it is
> >>> possible for a device from vendor A to migrate to a device from vendor
> >>> B, as long as they both implement the same standard migration format.
> >>> The limitation of this approach is that vendor-specific state cannot
> >>> be transferred.
> >>>
> >>> 3. The device can save/load opaque blobs. This is the initial VFIO
> >>> approach.
> >>
> >> I still don't get why it must be opaque.
> > If the device state format needs to be in the VMM then each device
> > needs explicit enablement in each VMM (QEMU, cloud-hypervisor, etc).
> >
> > Let's invert the question: why does the VMM need to understand the
> > device state of a _passthrough_ device?
>
>
> For better manageability, compatibility and debug-ability. If we depends
> on a opaque structure, do we encourage device to implement its own
> migration protocol? It would be very challenge.
>
> For VFIO in the kernel, I suspect a uAPI that may result a opaque data
> to be read or wrote from guest violates the Linux uAPI principle. It
> will be very hard to maintain uABI or even impossible. It looks to me
> VFIO is the first subsystem that is trying to do this.
I think our concepts of uAPI are different. The uAPI of read(2) and
write(2) does not define the structure of the data buffers. VFIO
device regions are exactly the same, the structure of the data is not
defined by the kernel uAPI.
Maybe microcode and firmware loading is an example we agree on?
> >>> A device from vendor A cannot migrate to a device from
> >>> vendor B because the format is incompatible. This approach works well
> >>> when devices have unique guest-visible hardware interfaces so the
> >>> guest wouldn't be able to handle migrating a device from vendor A to a
> >>> device from vendor B anyway.
> >>
> >> For VFIO I guess cross vendor live migration can't succeed unless we do
> >> some cheats in device/vendor id.
> > Yes. I haven't looked into the details of PCI (Sub-)Device/Vendor IDs
> > and how to best enable migration but I hope that can be solved. The
> > simplest approach is to override the IDs and make them part of the
> > guest configuration.
>
>
> That would be very tricky (or requires whitelist). E.g the opaque of the
> src may match the opaque of the dst by chance.
Luckily identifying things based on magic constants has been solved
many times in the past.
A central identifier registry prevents all collisions but is a pain to
manage. Or use a 128-bit UUID and self-allocate the identifier with an
extremely low chance of collision:
https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions
> >> For at least virtio, they will still go with virtio/vDPA. The advantages
> >> are:
> >>
> >> 1) virtio/vDPA can serve kernel subsystems which VFIO can't, this is
> >> very important for containers
> > I'm not sure I understand this. If the kernel wants to use the device
> > then it doesn't use VFIO, it runs the kernel driver instead.
>
>
> Current spec is not suitable for all type of device. We've received many
> feedbacks that virtio(pci) might not work very well. Another point is
> that there could be vendor that don't want go with virtio control path.
> Mellanox mlx5 vdpa driver is one example. Yes, they can use mlx5_en, but
> there are vendors that want to build a vendor specific control path from
> scratch.
Okay, I think I understand you mean now. This is the reason why vDPA exists.
Stefan
next prev parent reply other threads:[~2020-10-30 13:17 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-27 15:14 Out-of-Process Device Emulation session at KVM Forum 2020 Stefan Hajnoczi
2020-10-28 9:32 ` Thanos Makatos
2020-10-28 10:07 ` Thanos Makatos
2020-10-28 11:09 ` Michael S. Tsirkin
2020-10-29 8:21 ` Stefan Hajnoczi
2020-10-29 12:08 ` Stefan Hajnoczi
2020-10-29 13:02 ` Jason Wang
2020-10-29 13:06 ` Paolo Bonzini
2020-10-29 14:08 ` Stefan Hajnoczi
2020-10-29 14:31 ` Alex Williamson
2020-10-29 15:09 ` Jason Wang
2020-10-29 15:46 ` Alex Williamson
2020-10-29 16:10 ` Paolo Bonzini
2020-10-30 1:11 ` Jason Wang
2020-10-30 3:04 ` Alex Williamson
2020-10-30 6:21 ` Stefan Hajnoczi
2020-10-30 9:45 ` Jason Wang
2020-10-30 11:13 ` Stefan Hajnoczi
2020-10-30 12:07 ` Jason Wang
2020-10-30 13:15 ` Stefan Hajnoczi [this message]
2020-11-02 2:51 ` Jason Wang
2020-11-02 10:13 ` Stefan Hajnoczi
2020-11-03 7:52 ` Jason Wang
2020-11-03 14:26 ` Stefan Hajnoczi
2020-11-04 6:50 ` Gerd Hoffmann
2020-11-04 7:42 ` Michael S. Tsirkin
2020-10-31 21:49 ` Michael S. Tsirkin
2020-11-01 8:26 ` Paolo Bonzini
2020-11-02 2:54 ` Jason Wang
2020-11-02 3:00 ` Jason Wang
2020-11-02 10:27 ` Stefan Hajnoczi
2020-11-02 10:34 ` Michael S. Tsirkin
2020-11-02 14:59 ` Stefan Hajnoczi
2020-10-30 7:51 ` Michael S. Tsirkin
2020-10-30 9:31 ` Jason Wang
2020-10-29 16:15 ` David Edmondson
2020-10-29 16:42 ` Daniel P. Berrangé
2020-10-29 17:47 ` Kirti Wankhede
2020-10-29 18:07 ` Paolo Bonzini
2020-10-30 1:15 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJSP0QX_=dbDB2k7H-6D19ns1_HuM2P5ZMtUrFN9H7WU8aDXCg@mail.gmail.com' \
--to=stefanha@gmail.com \
--cc=alex.bennee@linaro.org \
--cc=alex.williamson@redhat.com \
--cc=anup@brainfault.org \
--cc=berrange@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=darren.kenny@oracle.com \
--cc=david@gibson.dropbear.id.au \
--cc=dinechin@redhat.com \
--cc=ehabkost@redhat.com \
--cc=elena.ufimtseva@oracle.com \
--cc=eperezma@redhat.com \
--cc=fam@euphon.net \
--cc=felipe@nutanix.com \
--cc=frankja@linux.vnet.ibm.com \
--cc=imbrenda@linux.vnet.ibm.com \
--cc=jag.raman@oracle.com \
--cc=jasowang@redhat.com \
--cc=jfreimann@redhat.com \
--cc=john.g.johnson@oracle.com \
--cc=kchamart@redhat.com \
--cc=kraxel@redhat.com \
--cc=kwankhede@nvidia.com \
--cc=kwolf@redhat.com \
--cc=liran.alon@oracle.com \
--cc=marcandre.lureau@redhat.com \
--cc=mtsirkin@redhat.com \
--cc=pasic@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=philmd@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rkagan@virtuozzo.com \
--cc=sgarzare@redhat.com \
--cc=slp@redhat.com \
--cc=stefanha@redhat.com \
--cc=thanos.makatos@nutanix.com \
--cc=yan@daynix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).