From mboxrd@z Thu Jan 1 00:00:00 1970 From: Amit Shah Subject: Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Date: Wed, 24 Sep 2008 10:28:08 +0530 Message-ID: <200809241028.09067.amit.shah@redhat.com> References: <1222181695-23418-1-git-send-email-amit.shah@redhat.com> <1222181695-23418-6-git-send-email-amit.shah@redhat.com> <48D919A8.8040409@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Cc: avi@redhat.com, kvm@vger.kernel.org, muli@il.ibm.com, benami@il.ibm.com, weidong.han@intel.com, allen.m.kay@intel.com To: Anthony Liguori Return-path: Received: from mx1.redhat.com ([66.187.233.31]:53677 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753620AbYIXFAR (ORCPT ); Wed, 24 Sep 2008 01:00:17 -0400 In-Reply-To: <48D919A8.8040409@codemonkey.ws> Content-Disposition: inline Sender: kvm-owner@vger.kernel.org List-ID: * On Tuesday 23 Sep 2008 22:00:32 Anthony Liguori wrote: > Amit Shah wrote: > > diff --git a/qemu/Makefile.target b/qemu/Makefile.target > > index 72f3db8..40eb273 100644 > > --- a/qemu/Makefile.target > > +++ b/qemu/Makefile.target > > @@ -616,6 +616,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o > > OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o > > OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o > > OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o > > +OBJS+= device-assignment.o > > This needs to be conditional on at least linux hosts, but probably also > kvm support. I didn't see any other file that's doing it. So I added this conditional in vl.c by having a #if defined(__linux__). That's how usb-linux.c does it as well. Is there a better way? Not the whole functionality needs kvm support. This should be able to work even without kvm (for example, when the guest is 1:1 mapped in the host address space). > > + /* FIXME: Add support for emulated MMIO for non-kvm guests */ > > + if (kvm_enabled()) { > > This doesn't work at all if kvm isn't enabled right? You should > probably bail out in the init if kvm isn't enabled. If this whole file > is included conditionally based on KVM support, then you don't have to > worry about using kvm_enabled() guards to conditionally compile out code. Non-kvm support is currently broken and should be fixed, but that can happen after we get this merged. I can temporarily add a check for kvm_enabled and bail out. > > + sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/", > > + r_bus, r_dev, r_func); > > snprintf() It's guarded by the %02x modifiers; so this doesn't depend on user input. > > + strcpy(name, dir); > > + strcat(name, "config"); > > snprintf() ... and as a result, all these don't depend on user input. > > +#define MAX_ASSIGNED_DEVS 4 > > +struct { > > + char name[15]; > > + int bus; > > + int dev; > > + int func; > > + AssignedDevice *assigned_dev; > > +} assigned_devices[MAX_ASSIGNED_DEVS]; > > Any reason not to just use a list here? sys-queue.h makes that very easy. > > + fprintf(stderr, "Registered host PCI device %02x:%02x.%1x " > > + "(\"%s\") as guest device %02x:%02x.%1x\n", > > + r_bus, r_dev, r_func, e_dev_name, > > + pci_bus_num(e_bus), e_device, r_func); > > Please don't fprintf() unconditionally. OK; however, a vmdk file open does that so I though it was alright to do it. > A lot more checks are needed here to see if things can succeed. We > definitely should bail out if they can't. Bailing out is done in the out: label below. What else do you think can fail? I've taken care of all the cases that do fail IMO. > > + return pci_dev; > > +out: > > + pci_unregister_device(&pci_dev->dev); > > + return NULL; > > +} > > +/* > > + * Syntax to assign device: > > + * > > + * -pcidevice dev=bus:dev.func,dma=dma > > + * > > + * Example: > > + * -pcidevice host=00:13.0,dma=pvdma > > + * > > + * dma can currently only be 'none' to disable iommu support. > > Does it actually work if you disable iommu support? If the guest is 1:1 mapped. > > diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h > > new file mode 100644 > > index 0000000..b77e484 > > --- /dev/null > > +++ b/qemu/hw/device-assignment.h > > +#include > > Don't think this is needed here. We use mmap(), so this is needed. > > +#include "qemu-common.h" > > +#include "pci.h" > > +#include > > Nor this. This isn't. > > diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c > > index 6053103..4a611cc 100644 > > --- a/qemu/hw/pc.c > > +++ b/qemu/hw/pc.c > > + /* Initialize assigned devices */ > > + if (pci_enabled) { > > + int r = -1; > > + do { > > + init_assigned_device(pci_bus, &r); > > Why pass r by reference instead of just returning it? At any rate, you > should detect when this fails and gracefully terminate QEMU. 'r' is the count of the number of assigned devices -- mostly needed because we have the data stored in an array. If we migrate to a list, this can be relaxed. ATM, I start the guest without assigning the device. I haven't figured out a way to gracefully terminate qemu yet. > > --- a/qemu/vl.c > > +++ b/qemu/vl.c > > +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__) > > + case QEMU_OPTION_pcidevice: > > + add_assigned_device(optarg); > > You should copy into an array, then in pc.c, iterate through the array > and call into add_assigned_device. Is there any benefit in doing this? We're moving the iterate out of vl.c to pc.c and both will happen at the same time. > > Regards, > > Anthony Liguori Thanks! Amit.