From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Andrey Ryabinin <arbn@yandex-team.com>
Cc: qemu-devel@nongnu.org,
"Steve Sistare" <steven.sistare@oracle.com>,
yc-core@yandex-team.ru, "Tony Krowiak" <akrowiak@linux.ibm.com>,
"Halil Pasic" <pasic@linux.ibm.com>,
"Jason Herne" <jjherne@linux.ibm.com>,
"Cornelia Huck" <cohuck@redhat.com>,
"Thomas Huth" <thuth@redhat.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Eric Farman" <farman@linux.ibm.com>,
"Matthew Rosato" <mjrosato@linux.ibm.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Eduardo Habkost" <eduardo@habkost.net>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Cleber Rosa" <crosa@redhat.com>,
"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
"Wainer dos Santos Moschetta" <wainersm@redhat.com>,
"Beraldo Leal" <bleal@redhat.com>
Subject: Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU
Date: Mon, 17 Oct 2022 12:05:17 +0100 [thread overview]
Message-ID: <Y0027XOMm/lfftGK@redhat.com> (raw)
In-Reply-To: <20221017105407.3858-1-arbn@yandex-team.com>
On Mon, Oct 17, 2022 at 01:54:03PM +0300, Andrey Ryabinin wrote:
> These patches add possibility to pass VFIO device to QEMU using file
> descriptors of VFIO container/group, instead of creating those by QEMU.
> This allows to take away permissions to open /dev/vfio/* from QEMU and
> delegate that to managment layer like libvirt.
>
> The VFIO API doen't allow to pass just fd of device, since we also need to have
> VFIO container and group. So these patches allow to pass created VFIO container/group
> to QEMU via command line/QMP, e.g. like this:
> -object vfio-container,id=ct,fd=5 \
> -object vfio-group,id=grp,fd=6,container=ct \
> -device vfio-pci,host=05:00.0,group=grp
>
> A bit more detailed example can be found in the test:
> tests/avocado/vfio.py
>
> *Possible future steps*
>
> Also these patches could be a step for making local migration (within one host)
> of the QEMU with VFIO devices.
> I've built some prototype on top of these patches to try such idea.
> In short the scheme of such migration is following:
> - migrate source VM to file.
> - retrieve fd numbers of VFIO container/group/device via new property and qom-get command
> - get the actual file descriptor via SCM_RIGHTS using new qmp command 'returnfd' which
> sends fd from QEMU by the number: { 'command': 'returnfd', 'data': {'fd': 'int'}}
> - shutdown source VM
> - launch destination VM, plug VFIO devices using obtained file descriptors.
> - PCI device reset duriing plugging the device avoided with the help of new parameter
> on vfio-pci device.
Is there a restriction by VFIO on how many processes can have the FD
open concurrently ? I guess it must be, as with SCM_RIGHTS, both src
QEMU and libvirt will have the FD open concurrently for at least a
short period, as you can't atomically close the FD at the exact same
time as SCM_RIGHTS sends it.
With migration it is *highly* desirable to never stop the source VM's
QEMU until the new QEMU has completed migration and got its vCPUs
running, in order to have best chance of successful rollback upon
failure
So assuming both QEMU's can have the FD open, provided they don't
both concurrently operate on it, could src QEMU just pass the FDs
to the target QEMU as part of the migration stream. eg use a UNIX
socket between the 2 QEMUs, and SCM_RIGHTS to pass the FDs across,
avoiding libvirt needing to be in the middle of the FD passing
dance. Since target QEMU gets the FDs as part of the migration
stream, it would inherantly know that it shold skip device reset
in that flow, without requiring any new param.
> This is alternative to 'cpr-exec' migration scheme proposed here:
> https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sistare@oracle.com/
> Unlike cpr-exec it doesn't require new kernel flags VFIO_DMA_UNMAP_FLAG_VADDR/VFIO_DMA_MAP_FLAG_VADDR
> And doesn't require new migration mode, just some additional steps from management layer.
Avoiding creating a whole new set of mgmt commands in QMP does
make this appealing as an option instead of cpr-exec. If we can
layer FD passing into the migration stream too, that'd be even
more compelling IMHO.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2022-10-17 11:12 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-17 10:54 [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU Andrey Ryabinin
2022-10-17 10:54 ` [PATCH 1/4] vfio: add vfio-container user createable object Andrey Ryabinin
2022-10-17 10:54 ` [PATCH 2/4] vfio: add vfio-group " Andrey Ryabinin
2022-10-17 12:37 ` Markus Armbruster
2022-10-17 10:54 ` [PATCH 3/4] vfio: Add 'group' property to 'vfio-pci' device Andrey Ryabinin
2022-10-17 10:54 ` [PATCH 4/4] tests/avocado/vfio: add test for vfio devices Andrey Ryabinin
2022-10-17 11:05 ` Daniel P. Berrangé [this message]
2022-10-26 10:44 ` [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU Andrey Ryabinin
2022-10-17 15:21 ` Alex Williamson
2022-10-26 12:07 ` Andrey Ryabinin
2022-10-26 17:22 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y0027XOMm/lfftGK@redhat.com \
--to=berrange@redhat.com \
--cc=akrowiak@linux.ibm.com \
--cc=alex.williamson@redhat.com \
--cc=arbn@yandex-team.com \
--cc=armbru@redhat.com \
--cc=bleal@redhat.com \
--cc=cohuck@redhat.com \
--cc=crosa@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=f4bug@amsat.org \
--cc=farman@linux.ibm.com \
--cc=jjherne@linux.ibm.com \
--cc=mjrosato@linux.ibm.com \
--cc=pasic@linux.ibm.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=steven.sistare@oracle.com \
--cc=thuth@redhat.com \
--cc=wainersm@redhat.com \
--cc=yc-core@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).