From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Date: Wed, 24 Sep 2008 10:07:24 -0500 Message-ID: <48DA57AC.1020903@codemonkey.ws> References: <1222181695-23418-1-git-send-email-amit.shah@redhat.com> <1222181695-23418-6-git-send-email-amit.shah@redhat.com> <48D919A8.8040409@codemonkey.ws> <200809241028.09067.amit.shah@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: avi@redhat.com, kvm@vger.kernel.org, muli@il.ibm.com, benami@il.ibm.com, weidong.han@intel.com, allen.m.kay@intel.com To: Amit Shah Return-path: Received: from ug-out-1314.google.com ([66.249.92.169]:7844 "EHLO ug-out-1314.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751681AbYIXPIa (ORCPT ); Wed, 24 Sep 2008 11:08:30 -0400 Received: by ug-out-1314.google.com with SMTP id k3so1866055ugf.37 for ; Wed, 24 Sep 2008 08:08:28 -0700 (PDT) In-Reply-To: <200809241028.09067.amit.shah@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Amit Shah wrote: > * On Tuesday 23 Sep 2008 22:00:32 Anthony Liguori wrote: > >> Amit Shah wrote: >> > > >>> diff --git a/qemu/Makefile.target b/qemu/Makefile.target >>> index 72f3db8..40eb273 100644 >>> --- a/qemu/Makefile.target >>> +++ b/qemu/Makefile.target >>> @@ -616,6 +616,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o >>> OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o >>> OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o >>> OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o >>> +OBJS+= device-assignment.o >>> >> This needs to be conditional on at least linux hosts, but probably also >> kvm support. >> > > I didn't see any other file that's doing it. So I added this conditional in > vl.c by having a #if defined(__linux__). That's how usb-linux.c does it as > well. Is there a better way? > aio and compatfd currently do it this way. block-raw-win32 and block-raw-posix are this way. We're slowly moving things away from #ifdef #else #endif to conditional compilation. > Not the whole functionality needs kvm support. This should be able to work > even without kvm (for example, when the guest is 1:1 mapped in the host > address space). > KVM is needed for interrupt remapping though. That's something I don't see happening for normal userspace any time soon. >>> + /* FIXME: Add support for emulated MMIO for non-kvm guests */ >>> + if (kvm_enabled()) { >>> >> This doesn't work at all if kvm isn't enabled right? You should >> probably bail out in the init if kvm isn't enabled. If this whole file >> is included conditionally based on KVM support, then you don't have to >> worry about using kvm_enabled() guards to conditionally compile out code. >> > > Non-kvm support is currently broken and should be fixed, but that can happen > after we get this merged. > But it would take bouncing interrupts to userspace? I don't think that will ever happen upstream personally. At any rate, there's no point in even trying to support something like that until progress is made upstream on this front. > I can temporarily add a check for kvm_enabled and bail out. > > >>> + sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/", >>> + r_bus, r_dev, r_func); >>> >> snprintf() >> > > It's guarded by the %02x modifiers; so this doesn't depend on user input. > strcpy or sprintf should never be used. It doesn't matter if it's safe in a particular instance. There are safer functions to use (like snprintf). All it takes is for someone to come along and change the /sys/bus path to be larger without adjusting the buffer size and everything goes to hell. It's inherently brittle. >>> + fprintf(stderr, "Registered host PCI device %02x:%02x.%1x " >>> + "(\"%s\") as guest device %02x:%02x.%1x\n", >>> + r_bus, r_dev, r_func, e_dev_name, >>> + pci_bus_num(e_bus), e_device, r_func); >>> >> Please don't fprintf() unconditionally. >> > > OK; however, a vmdk file open does that so I though it was alright to do it. > I obviously don't use vmdk or else I would have removed that by now :-) >> A lot more checks are needed here to see if things can succeed. We >> definitely should bail out if they can't. >> > > Bailing out is done in the out: label below. What else do you think can fail? > I've taken care of all the cases that do fail IMO. > > >>> + return pci_dev; >>> +out: >>> + pci_unregister_device(&pci_dev->dev); >>> + return NULL; >>> +} >>> > > >>> +/* >>> + * Syntax to assign device: >>> + * >>> + * -pcidevice dev=bus:dev.func,dma=dma >>> + * >>> + * Example: >>> + * -pcidevice host=00:13.0,dma=pvdma >>> + * >>> + * dma can currently only be 'none' to disable iommu support. >>> >> Does it actually work if you disable iommu support? >> > > If the guest is 1:1 mapped. > You mean with Andrea's reserved ram patches? >>> +#include >>> >> Don't think this is needed here. >> > > We use mmap(), so this is needed. > Ah. >>> + /* Initialize assigned devices */ >>> + if (pci_enabled) { >>> + int r = -1; >>> + do { >>> + init_assigned_device(pci_bus, &r); >>> >> Why pass r by reference instead of just returning it? At any rate, you >> should detect when this fails and gracefully terminate QEMU. >> > > 'r' is the count of the number of assigned devices -- mostly needed because we > have the data stored in an array. If we migrate to a list, this can be > relaxed. > > ATM, I start the guest without assigning the device. I haven't figured out a > way to gracefully terminate qemu yet. > In the case of hot plug, you fail the hot plug. If you start with device assignment, just doing an "exit" would be sufficient. >>> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__) >>> + case QEMU_OPTION_pcidevice: >>> + add_assigned_device(optarg); >>> >> You should copy into an array, then in pc.c, iterate through the array >> and call into add_assigned_device. >> > > Is there any benefit in doing this? We're moving the iterate out of vl.c to > pc.c and both will happen at the same time. > It's how everything else works. Regards, Anthony Liguori >> Regards, >> >> Anthony Liguori >> > > Thanks! > > Amit. >