On 2015年10月22日 02:39, Alex Williamson wrote: > On Thu, 2015-10-22 at 00:52 +0800, Lan Tianyu wrote: >> This patchset is Qemu part for live migration support for SRIOV NIC. >> kernel part patch information is in the following link. >> http://marc.info/?l=kvm&m=144544635330193&w=2 >> >> >> Lan Tianyu (3): >> Qemu: Add pci-assign.h to share functions and struct definition with >> new file >> Qemu: Add post_load_state() to run after restoring CPU state >> Qemu: Introduce pci-sriov device type to support VF live migration >> >> hw/i386/kvm/Makefile.objs | 2 +- >> hw/i386/kvm/pci-assign.c | 113 +---------------------- >> hw/i386/kvm/pci-assign.h | 109 +++++++++++++++++++++++ >> hw/i386/kvm/sriov.c | 213 ++++++++++++++++++++++++++++++++++++++++++++ >> include/migration/vmstate.h | 2 + >> migration/savevm.c | 15 ++++ >> 6 files changed, 344 insertions(+), 110 deletions(-) >> create mode 100644 hw/i386/kvm/pci-assign.h >> create mode 100644 hw/i386/kvm/sriov.c >> > Hi Lan, Hi Alex: Thanks a lot for your comments. It's very helpful. > > Seems like there are a couple immediate problems with this approach. > The first is that you're modifying legacy KVM device assignment, which > is deprecated upstream and not even enabled by some distros. VFIO is > the supported mechanism for doing PCI device assignment now and any > features like this need to be added there first. It's not only more > secure than legacy KVM device assignment, but it also doesn't limit this > to an x86-only solution. Surely you want to support 82599 VF migration > on other platforms as well. Yes, we will turn to VFIO and just uses legacy mode to show our idea as soon as possible. > > Using sysfs to interact with the PF is also problematic since that means > that libvirt needs to grant qemu access to these files, adding one more > layer to the stack. If we were to use VFIO, we could potentially enable > this through a save-state region on the device file descriptor and if > necessary, virtual interrupt channels for the device as well. This of > course implies that the kernel internal channels are made as general as > possible in order to support any PF driver. This sounds reasonable. > > That said, there are some nice features here. Using unused PCI config > bytes to communicate with the guest driver and enable guest-based page > dirtying is a nice hack. However, if we want to add this capability to > other devices, we're not always going to be able to use fixed addresses > 0xf0 and 0xf1. I would suggest that we probably want to create a > virtual capability in the config space of the VF, perhaps a Vendor > Specific capability. Obviously some devices won't have room for a full > capability in the standard config space, so we may need to optionally > expose it in extended config space. Those device would be limited to > only supporting migration in PCI-e configurations in the guest. Also, > plenty of devices make use of undefined PCI config space, so we may not > be able to simply add a capability to a region we think is unused, maybe > it needs to happen through reserved space in another capability or > perhaps defining a virtual BAR that unenlightened guest drivers would > ignore. The point is that we somehow need to standardize that so that > rather than implicitly know that it's at 0xf0/0xf1 on 82599 VFs. Yes, use "0xF0" and "0xF1" to show idea and it's need more effort to find the suitable place. Will research more. > > Also, I haven't looked at the kernel-side patches yet, but the saved > state received from and loaded into the PF driver needs to be versioned > and maybe we need some way to know whether versions are compatible. > Migration version information is difficult enough for QEMU, it's a > completely foreign concept in the kernel. Thanks, Good point. Will add it into next version. > > Alex > -- Best regards Tianyu Lan