On 2015年10月22日 02:39, Alex Williamson wrote:
> On Thu, 2015-10-22 at 00:52 +0800, Lan Tianyu wrote:
>> This patchset is Qemu part for live migration support for SRIOV NIC.
>> kernel part patch information is in the following link.
>> http://marc.info/?l=kvm&m=144544635330193&w=2
>>
>>
>> Lan Tianyu (3):
>>   Qemu: Add pci-assign.h to share functions and struct definition with
>>     new file
>>   Qemu: Add post_load_state() to run after restoring CPU state
>>   Qemu: Introduce pci-sriov device type to support VF live migration
>>
>>  hw/i386/kvm/Makefile.objs   |   2 +-
>>  hw/i386/kvm/pci-assign.c    | 113 +----------------------
>>  hw/i386/kvm/pci-assign.h    | 109 +++++++++++++++++++++++
>>  hw/i386/kvm/sriov.c         | 213 ++++++++++++++++++++++++++++++++++++++++++++
>>  include/migration/vmstate.h |   2 +
>>  migration/savevm.c          |  15 ++++
>>  6 files changed, 344 insertions(+), 110 deletions(-)
>>  create mode 100644 hw/i386/kvm/pci-assign.h
>>  create mode 100644 hw/i386/kvm/sriov.c
>>
> Hi Lan,

Hi Alex:
        Thanks a lot for your comments. It's very helpful.

>
> Seems like there are a couple immediate problems with this approach.
> The first is that you're modifying legacy KVM device assignment, which
> is deprecated upstream and not even enabled by some distros.  VFIO is
> the supported mechanism for doing PCI device assignment now and any
> features like this need to be added there first.  It's not only more
> secure than legacy KVM device assignment, but it also doesn't limit this
> to an x86-only solution.  Surely you want to support 82599 VF migration
> on other platforms as well.

Yes, we will turn to VFIO and just uses legacy mode to show our
idea as soon as possible.

>
> Using sysfs to interact with the PF is also problematic since that means
> that libvirt needs to grant qemu access to these files, adding one more
> layer to the stack.  If we were to use VFIO, we could potentially enable
> this through a save-state region on the device file descriptor and if
> necessary, virtual interrupt channels for the device as well.  This of
> course implies that the kernel internal channels are made as general as
> possible in order to support any PF driver.

This sounds reasonable.

>
> That said, there are some nice features here.  Using unused PCI config
> bytes to communicate with the guest driver and enable guest-based page
> dirtying is a nice hack.  However, if we want to add this capability to
> other devices, we're not always going to be able to use fixed addresses
> 0xf0 and 0xf1.  I would suggest that we probably want to create a
> virtual capability in the config space of the VF, perhaps a Vendor
> Specific capability.  Obviously some devices won't have room for a full
> capability in the standard config space, so we may need to optionally
> expose it in extended config space.  Those device would be limited to
> only supporting migration in PCI-e configurations in the guest.  Also,
> plenty of devices make use of undefined PCI config space, so we may not
> be able to simply add a capability to a region we think is unused, maybe
> it needs to happen through reserved space in another capability or
> perhaps defining a virtual BAR that unenlightened guest drivers would
> ignore.  The point is that we somehow need to standardize that so that
> rather than implicitly know that it's at 0xf0/0xf1 on 82599 VFs.

Yes, use "0xF0" and "0xF1"  to show idea and it's need more
effort to find the suitable place. Will research more.

>
> Also, I haven't looked at the kernel-side patches yet, but the saved
> state received from and loaded into the PF driver needs to be versioned
> and maybe we need some way to know whether versions are compatible.
> Migration version information is difficult enough for QEMU, it's a
> completely foreign concept in the kernel.  Thanks,

Good point. Will add it into next version.


>
> Alex
>


-- 
Best regards
Tianyu Lan