From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 25 Aug 2021 15:13:48 -0300 From: Jason Gunthorpe Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: <20210825181348.GL1721383@nvidia.com> References: <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> <0252a058-f3d2-db34-08a0-02c3cdd0e0bb@nvidia.com> <20210824131007.GT1721383@nvidia.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline To: Jason Wang Cc: Max Gurtovoy , "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Wed, Aug 25, 2021 at 12:58:01PM +0800, Jason Wang wrote: > On Tue, Aug 24, 2021 at 9:10 PM Jason Gunthorpe wrote: > > > > On Tue, Aug 24, 2021 at 10:41:54AM +0800, Jason Wang wrote: > > > > > > migration exposed to the guest ? No. > > > > > > Can you explain why? > > > > For the SRIOV case migration is a privileged operation of the > > hypervisor. The guest must not be allowed to interact with it in any > > way otherwise the hypervisor migration could be attacked from the > > guest and this has definite security implications. > > > > In practice this means that nothing related to migration can be > > located on the MMIO pages/queues/etc of the VF. The reasons for this > > are a bit complicated and has to do with the limitations of IO > > isolation with VFIO - eg you can't reliably split a single PCI BDF > > into hypervisor/guest security domains without PASID. > > So exposing the migration function can be done indirectly: > > In L0, the hardware implements the function via PF, Qemu will present > an emulated PCI device then Qemu can expose those functions via a > capability for L1 guests. When L1 driver tries to use those functions, > it goes: > > L1 virtio-net driver -(emulated PCI-E BAR)-> Qemu -(ioctl)-> L0 kernel > VF driver -> L0 kernel PF driver -(virtio interface)-> virtio PF > > In this approach, there's no way for the L1 driver to control the or > see what is implemented in the hardware (PF). The details were hidden > by Qemu. This works even if DMA is required for the L0 kernel PF > driver to talk with the hardware since for L1 we didn't present a DMA > interface. With the future PASID support, we can even present a DMA > interface to L1. Sure, you can do this, but that isn't what is being talked about here, and honestly seems like a highly contrived use case. Further, in this mode I'd expect the hypervisor kernel driver to provide the migration support without requiring any special HW function. > > I see in this thread that these two things are becoming quite > > confused. They are very different, have different security postures > > and use different parts of the hypervisor stack, and intended for > > quite different use cases. > > It looks like the full PCI VF could go via the virtio-pci vDPA driver > as well (drivers/vdpa/virtio-pci). So what's the advantages of > exposing the migration of virtio via vfio instead of vhost-vDPA? Can't say, both are possibly valid approaches with different trade offs. Off hand I think it is just unneeded complexity to use VDPA if the device is already exposing a fully functional virtio-pci interface. I see VDPA as being useful to create HW accelerated virtio interface from HW that does not natively speak full virtio. > 1) migration compatibility with the existing software virtio and > vhost/vDPA implementations IMHO the the virtio spec should define the format of the migration state and I'd expect interworking between all the different implementations. > > I agree it would be good spec design to have a general concept of a > > secure and guest world and specific sections that defines how it works > > for different scenarios, but that seems like a language remark and not > > one about the design. For instance the admin queue Max is adding is > > clearly part of the secure world and putting it on the PF is the only > > option for the SRIOV mode. > > Yes, but let's move common functionality that is required for all > transports to the chapter of "basic device facility". We don't need to > define how it works in other different scenarios now. It seems like a reasonable way to write the spec. I'd define a secure admin queue and define how the ops on that queue work Then seperately define how to instantiate the secure admin queue in all the relevant scenarios. Jason