From: Christoph Hellwig <hch@infradead.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
maorg@nvidia.com, virtualization@lists.linux-foundation.org,
Christoph Hellwig <hch@infradead.org>,
jiri@nvidia.com, leonro@nvidia.com
Subject: Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
Date: Tue, 10 Oct 2023 23:26:42 -0700 [thread overview]
Message-ID: <ZSZAIl06akEvdExM@infradead.org> (raw)
In-Reply-To: <20231010131031.GJ3952@nvidia.com>
On Tue, Oct 10, 2023 at 10:10:31AM -0300, Jason Gunthorpe wrote:
> We've talked around ideas like allowing the VF config space to do some
> of the work. For simple devices we could get away with 1 VF config
> space register. (VF config space is owned by the hypervisor, not the
> guest)
Which assumes you're actually using VFs and not multiple PFs, which
is a very limiting assumption. It also limits your from actually
using DMA during the live migration process, which again is major
limitation once you have a non-tivial amount of state.
> SIOVr2 is discussing more a flexible RID mapping - there is a possible
> route where a "VF" could actually have two RIDs, a hypervisor RID and a
> guest RID.
Well, then you go down the SIOV route, which requires a complex driver
actually presenting the guest visible device anyway.
> It really is PCI limitations that force this design of making a PF
> driver do dual duty as a fully functionally normal device and act as a
> communication channel proxy to make a back channel into a SRIOV VF.
>
> My view has always been that the VFIO live migration operations are
> executed logically within the VF as they only effect the VF.
>
> So we have a logical design seperation where VFIO world owns the
> commands and the PF driver supplies the communication channel. This
> works well for devices that already have a robust RPC interface to
> their device FW.
Independent of my above points on the doubts on VF-controlled live
migration for PCe device I absolutely agree with your that the Linux
abstraction and user interface should be VF based. Which further
reinforeces my point that the VFIO driver for the controlled function
(PF or VF) and the Linux driver for the controlling function (better
be a PF in practice) must be very tightly integrated. And the best
way to do that is to export the vfio nodes from the Linux driver
that knowns the hardware and not split out into a separate one.
> > The driver that knows this hardware. In this case the virtio subsystem,
> > in case of nvme the nvme driver, and in case of mlx5 the mlx5 driver.
>
> But those are drivers operating the HW to create kernel devices. Here
> we need a VFIO device. They can't co-exist, if you switch mlx5 from
> normal to vfio you have to tear down the entire normal driver.
Yes, absolutey. And if we're smart enough we structure it in a way
that we never even initialize the bits of the driver only needed for
the normal kernel consumers.
> > No. That layout logically follows from what codebase the functionality
> > is part of, though.
>
> I don't understand what we are talking about really. Where do you
> imagine the vfio_register_XX() goes?
In the driver controlling the hardware. E.g. for virtio in
driver/virtio/ and for nvme in drivers/nvme/ and for mlx5
in the mlx5 driver directory.
> > > I don't know what "fake-legacy" even means, VFIO is not legacy.
> >
> > The driver we're talking about in this thread fakes up a virtio_pci
> > legacy devie to the guest on top of a "modern" virtio_pci device.
>
> I'm not sure I'd use the word fake, inb/outb are always trapped
> operations in VMs. If the device provided a real IO BAR then VFIO
> common code would trap and relay inb/outb to the device.
>
> All this is doing is changing the inb/outb relay from using a physical
> IO BAR to a DMA command ring.
>
> The motivation is simply because normal IO BAR space is incredibly
> limited and you can't get enough SRIOV functions when using it.
The fake is not meant as a judgement. But it creates a virtio-legacy
device that in this form does not exist in hardware. That's what
I call fake. If you prefer a different term that's fine with me too.
> > > There is alot of code in VFIO and the VMM side to take a VF and turn
> > > it into a vPCI function. You can't just trivially duplicate VFIO in a
> > > dozen drivers without creating a giant mess.
> >
> > I do not advocate for duplicating it. But the code that calls this
> > functionality belongs into the driver that deals with the compound
> > device that we're doing this work for.
>
> On one hand, I don't really care - we can put the code where people
> like.
>
> However - the Intel GPU VFIO driver is such a bad experiance I don't
> want to encourage people to make VFIO drivers, or code that is only
> used by VFIO drivers, that are not under drivers/vfio review.
We can and should require vfio review for users of the vfio API.
But to be honest code placement was not the problem with i915. The
problem was that the mdev APIs (under drivers/vfio) were a complete
trainwreck when it was written, and that the driver had a horrible
hypervisor API abstraction.
> Be aware, there is a significant performance concern here. If you want
> to create 1000 VFIO devices (this is a real thing), we *can't* probe a
> normal driver first, it is too slow. We need a path that goes directly
> from creating the RIDs to turning those RIDs into VFIO.
And by calling the vfio funtions from mlx5 you get this easily.
But I think you're totally mixing things up here anyway.
For mdev/SIOV like flows you must call vfio APIs from the main
driver anyway, as there is no pci_dev to probe on anyway. That's
what i915 does btw.
For "classic" vfio that requires a pci_dev (or $otherbus_dev) we need
to have a similar flow. And I think the best way is to have the
bus-level attribute on the device and/or a device-specific side band
protocol to device how new functions are probed. With that you
avoid all the duplicate PCI IDs for the binding, and actually allow to
sanely establush a communication channel between the functions.
Because without that there is no way to know how any two functions
related. The driver might think they know, but there's all kinds of
whacky PCI passthough schemes that will break such a logic.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2023-10-11 6:26 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-21 12:40 [PATCH vfio 00/11] Introduce a vfio driver over virtio devices Yishai Hadas via Virtualization
2023-09-21 12:40 ` [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one Yishai Hadas via Virtualization
2023-09-21 13:46 ` Michael S. Tsirkin
2023-09-26 19:13 ` Feng Liu via Virtualization
2023-09-27 18:09 ` Feng Liu via Virtualization
2023-09-27 21:24 ` Michael S. Tsirkin
2023-09-21 12:40 ` [PATCH vfio 02/11] virtio: Define feature bit for administration virtqueue Yishai Hadas via Virtualization
2023-09-21 12:40 ` [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue Yishai Hadas via Virtualization
2023-09-21 13:57 ` Michael S. Tsirkin
2023-09-26 19:23 ` Feng Liu via Virtualization
2023-09-27 18:12 ` Feng Liu via Virtualization
2023-09-27 21:27 ` Michael S. Tsirkin
2023-10-02 18:07 ` Feng Liu via Virtualization
2023-09-21 12:40 ` [PATCH vfio 04/11] virtio: Expose the synchronous command helper function Yishai Hadas via Virtualization
2023-09-21 12:40 ` [PATCH vfio 05/11] virtio-pci: Introduce admin command sending function Yishai Hadas via Virtualization
2023-09-21 12:40 ` [PATCH vfio 06/11] virtio-pci: Introduce API to get PF virtio device from VF PCI device Yishai Hadas via Virtualization
2023-09-21 12:40 ` [PATCH vfio 07/11] virtio-pci: Introduce admin commands Yishai Hadas via Virtualization
2023-09-24 5:18 ` kernel test robot
2023-09-25 3:18 ` kernel test robot
2023-09-21 12:40 ` [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap() Yishai Hadas via Virtualization
2023-09-21 16:35 ` Alex Williamson
2023-09-26 9:45 ` Yishai Hadas via Virtualization
2023-09-21 12:40 ` [PATCH vfio 09/11] vfio/pci: Expose vfio_pci_iowrite/read##size() Yishai Hadas via Virtualization
2023-09-21 12:40 ` [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device Yishai Hadas via Virtualization
2023-09-21 13:08 ` Michael S. Tsirkin
2023-09-21 20:34 ` Michael S. Tsirkin
2023-09-26 10:51 ` Yishai Hadas via Virtualization
2023-09-26 11:25 ` Michael S. Tsirkin
2023-09-22 9:54 ` Michael S. Tsirkin
2023-09-26 11:14 ` Yishai Hadas via Virtualization
2023-09-26 11:41 ` Michael S. Tsirkin
[not found] ` <20230927131817.GA338226@nvidia.com>
2023-09-27 21:30 ` Michael S. Tsirkin
[not found] ` <20230927231600.GD339126@nvidia.com>
2023-09-28 5:26 ` Michael S. Tsirkin
2023-10-02 6:28 ` Christoph Hellwig
[not found] ` <20231002151320.GA650762@nvidia.com>
2023-10-05 8:49 ` Christoph Hellwig
[not found] ` <20231005111004.GK682044@nvidia.com>
2023-10-06 13:09 ` Christoph Hellwig
[not found] ` <20231010131031.GJ3952@nvidia.com>
2023-10-10 13:56 ` Michael S. Tsirkin
[not found] ` <20231010140849.GL3952@nvidia.com>
2023-10-10 14:54 ` Michael S. Tsirkin
2023-10-10 15:09 ` Yishai Hadas via Virtualization
2023-10-10 15:14 ` Michael S. Tsirkin
2023-10-10 15:43 ` Yishai Hadas via Virtualization
2023-10-10 15:58 ` Parav Pandit via Virtualization
2023-10-10 15:58 ` Michael S. Tsirkin
2023-10-10 16:09 ` Yishai Hadas via Virtualization
2023-10-10 20:42 ` Michael S. Tsirkin
2023-10-11 7:44 ` Yishai Hadas via Virtualization
2023-10-11 8:02 ` Michael S. Tsirkin
2023-10-11 8:58 ` Yishai Hadas via Virtualization
2023-10-11 9:03 ` Michael S. Tsirkin
2023-10-11 11:25 ` Yishai Hadas via Virtualization
2023-10-11 6:12 ` Christoph Hellwig
[not found] ` <20231010155937.GN3952@nvidia.com>
2023-10-10 16:03 ` Michael S. Tsirkin
[not found] ` <20231010160712.GO3952@nvidia.com>
2023-10-10 16:21 ` Parav Pandit via Virtualization
2023-10-10 20:38 ` Michael S. Tsirkin
2023-10-11 6:13 ` Christoph Hellwig
2023-10-11 6:43 ` Michael S. Tsirkin
2023-10-11 6:59 ` Christoph Hellwig
2023-10-11 8:00 ` Parav Pandit via Virtualization
2023-10-11 8:10 ` Michael S. Tsirkin
[not found] ` <20231011121849.GV3952@nvidia.com>
2023-10-11 17:03 ` Michael S. Tsirkin
2023-10-11 17:05 ` Michael S. Tsirkin
2023-10-12 10:29 ` Zhu, Lingshan
[not found] ` <20231012132749.GK3952@nvidia.com>
2023-10-13 10:28 ` Zhu, Lingshan
2023-10-13 13:50 ` Michael S. Tsirkin
2023-10-16 8:33 ` Zhu, Lingshan
2023-10-16 8:52 ` Michael S. Tsirkin
2023-10-16 9:53 ` Zhu, Lingshan
2023-10-11 8:12 ` Michael S. Tsirkin
2023-10-12 10:30 ` Zhu, Lingshan
2023-10-11 6:26 ` Christoph Hellwig [this message]
[not found] ` <20231011135709.GW3952@nvidia.com>
2023-10-11 14:17 ` Christoph Hellwig
[not found] ` <20231011145810.GZ3952@nvidia.com>
2023-10-11 16:59 ` Michael S. Tsirkin
[not found] ` <20231011171944.GA3952@nvidia.com>
2023-10-11 20:20 ` Michael S. Tsirkin
2023-09-21 12:40 ` [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices Yishai Hadas via Virtualization
2023-09-21 13:16 ` Michael S. Tsirkin
[not found] ` <20230921141125.GM13733@nvidia.com>
2023-09-21 14:16 ` Michael S. Tsirkin
[not found] ` <20230921164139.GP13733@nvidia.com>
2023-09-21 16:53 ` Michael S. Tsirkin
[not found] ` <20230921183926.GV13733@nvidia.com>
2023-09-21 19:13 ` Michael S. Tsirkin
[not found] ` <20230921194946.GX13733@nvidia.com>
2023-09-21 20:45 ` Michael S. Tsirkin
[not found] ` <20230921225526.GE13733@nvidia.com>
2023-09-22 3:02 ` Jason Wang
2023-09-22 11:23 ` Michael S. Tsirkin
2023-09-22 3:01 ` Jason Wang
[not found] ` <20230922121132.GK13733@nvidia.com>
2023-09-25 2:34 ` Jason Wang
[not found] ` <20230925122607.GW13733@nvidia.com>
2023-09-25 19:44 ` Michael S. Tsirkin
[not found] ` <20230926004059.GM13733@nvidia.com>
2023-09-26 5:34 ` Michael S. Tsirkin
2023-09-26 5:42 ` Michael S. Tsirkin
[not found] ` <20230926135057.GO13733@nvidia.com>
2023-09-27 21:38 ` Michael S. Tsirkin
[not found] ` <20230927232005.GE339126@nvidia.com>
2023-09-28 5:31 ` Michael S. Tsirkin
2023-09-26 4:37 ` Jason Wang
2023-09-26 5:33 ` Parav Pandit via Virtualization
2023-09-21 19:17 ` Michael S. Tsirkin
[not found] ` <20230921195115.GY13733@nvidia.com>
2023-09-21 20:55 ` Michael S. Tsirkin
2023-09-25 4:44 ` Zhu, Lingshan
2023-09-22 3:45 ` Zhu, Lingshan
2023-09-21 13:33 ` Michael S. Tsirkin
2023-09-21 16:43 ` Alex Williamson
[not found] ` <20230921165224.GR13733@nvidia.com>
2023-09-21 17:01 ` Michael S. Tsirkin
2023-09-21 17:09 ` Parav Pandit via Virtualization
2023-09-21 17:24 ` Michael S. Tsirkin
[not found] ` <20230921170709.GS13733@nvidia.com>
2023-09-21 17:21 ` Michael S. Tsirkin
[not found] ` <20230921174450.GT13733@nvidia.com>
2023-09-21 17:55 ` Michael S. Tsirkin
[not found] ` <20230921181637.GU13733@nvidia.com>
2023-09-21 19:34 ` Michael S. Tsirkin
[not found] ` <20230921195345.GZ13733@nvidia.com>
2023-09-21 20:16 ` Michael S. Tsirkin
2023-09-22 3:02 ` Jason Wang
[not found] ` <20230922122246.GN13733@nvidia.com>
2023-09-22 12:25 ` Parav Pandit via Virtualization
2023-09-22 15:13 ` Michael S. Tsirkin
[not found] ` <20230922151534.GR13733@nvidia.com>
2023-09-22 15:40 ` Michael S. Tsirkin
[not found] ` <20230922162233.GT13733@nvidia.com>
2023-09-25 17:36 ` Michael S. Tsirkin
2023-09-25 2:30 ` Jason Wang
2023-09-25 8:26 ` Parav Pandit via Virtualization
2023-09-25 18:36 ` Michael S. Tsirkin
2023-09-26 2:34 ` Zhu, Lingshan
2023-09-26 3:45 ` Parav Pandit via Virtualization
2023-09-26 4:37 ` Jason Wang
2023-10-12 10:52 ` Michael S. Tsirkin
2023-10-12 11:11 ` Parav Pandit via Virtualization
2023-10-12 11:30 ` Michael S. Tsirkin
2023-10-12 11:40 ` Parav Pandit via Virtualization
2023-09-26 2:32 ` Jason Wang
2023-09-26 4:01 ` Parav Pandit via Virtualization
2023-09-26 4:37 ` Jason Wang
2023-09-26 5:27 ` Parav Pandit via Virtualization
2023-09-26 11:49 ` Michael S. Tsirkin
2023-10-08 4:28 ` Jason Wang
[not found] ` <20230921224836.GD13733@nvidia.com>
2023-09-22 9:47 ` Michael S. Tsirkin
[not found] ` <20230922122328.GO13733@nvidia.com>
2023-09-22 15:45 ` Michael S. Tsirkin
2023-09-22 3:02 ` Jason Wang
[not found] ` <20230922122501.GP13733@nvidia.com>
2023-09-22 15:39 ` Michael S. Tsirkin
[not found] ` <20230922161928.GS13733@nvidia.com>
2023-09-25 18:16 ` Michael S. Tsirkin
[not found] ` <20230925185318.GK13733@nvidia.com>
2023-09-25 19:52 ` Michael S. Tsirkin
2023-09-21 19:58 ` Alex Williamson
[not found] ` <20230921200121.GA13733@nvidia.com>
2023-09-21 20:20 ` Michael S. Tsirkin
2023-09-21 20:59 ` Alex Williamson
[not found] ` <20230922123708.GA130749@nvidia.com>
2023-09-22 12:59 ` Parav Pandit via Virtualization
2023-09-26 15:20 ` Yishai Hadas via Virtualization
2023-09-26 17:00 ` Michael S. Tsirkin
2023-10-02 4:38 ` Parav Pandit via Virtualization
2023-09-22 10:10 ` Michael S. Tsirkin
2023-09-22 15:53 ` Michael S. Tsirkin
2023-10-02 11:23 ` Parav Pandit via Virtualization
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZSZAIl06akEvdExM@infradead.org \
--to=hch@infradead.org \
--cc=jgg@nvidia.com \
--cc=jiri@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=leonro@nvidia.com \
--cc=maorg@nvidia.com \
--cc=mst@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).