From mboxrd@z Thu Jan 1 00:00:00 1970 From: Muli Ben-Yehuda Subject: Re: [PATCH 5/6] device assignment: support for assigning PCI devices to guests Date: Tue, 28 Oct 2008 17:53:05 +0200 Message-ID: <20081028155305.GE6737@il.ibm.com> References: <1225188410-2222-1-git-send-email-muli@il.ibm.com> <1225188410-2222-2-git-send-email-muli@il.ibm.com> <1225188410-2222-3-git-send-email-muli@il.ibm.com> <1225188410-2222-4-git-send-email-muli@il.ibm.com> <1225188410-2222-5-git-send-email-muli@il.ibm.com> <1225188410-2222-6-git-send-email-muli@il.ibm.com> <490733B5.5010102@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: avi@redhat.com, kvm@vger.kernel.org, weidong.han@intel.com, Ben-Ami Yassour1 , amit.shah@redhat.com, allen.m.kay@intel.com To: Anthony Liguori Return-path: Received: from mtagate3.uk.ibm.com ([195.212.29.136]:37986 "EHLO mtagate3.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752029AbYJ1Pxz (ORCPT ); Tue, 28 Oct 2008 11:53:55 -0400 Received: from d06nrmr1407.portsmouth.uk.ibm.com (d06nrmr1407.portsmouth.uk.ibm.com [9.149.38.185]) by mtagate3.uk.ibm.com (8.13.8/8.13.8) with ESMTP id m9SFrnQ9047966 for ; Tue, 28 Oct 2008 15:53:49 GMT Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1407.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id m9SFrnYV1114302 for ; Tue, 28 Oct 2008 15:53:49 GMT Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av02.portsmouth.uk.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m9SFrmYh024329 for ; Tue, 28 Oct 2008 15:53:49 GMT Content-Disposition: inline In-Reply-To: <490733B5.5010102@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Oct 28, 2008 at 10:45:57AM -0500, Anthony Liguori wrote: >> +ifeq ($(USE_KVM), 1) >> +OBJS+= device-assignment.o >> +endif > > I don't think you want to build this on PPC so I think you need a > stronger check. Good point. How about checking TARGET_BASE_ARCH = i386? >> +static void assigned_dev_ioport_writel(void *opaque, uint32_t addr, >> + uint32_t value) >> +{ >> + AssignedDevRegion *r_access = opaque; >> + uint32_t r_pio = guest_to_host_ioport(r_access, addr); >> + >> + DEBUG("%s: r_pio=%08x e_physbase=%08x r_virtbase=%08lx value=%08x\n", >> + r_pio, (int)r_access->e_physbase, >> + (unsigned long)r_access->r_virtbase, value); >> > > The format doesn't match the parameter count. Yep, already fixed. >> +static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, >> + uint32_t addr, uint32_t size, int >> type) >> +{ >> + AssignedDevice *r_dev = (AssignedDevice *) pci_dev; >> + AssignedDevRegion *region = &r_dev->v_addrs[region_num]; >> + uint32_t old_port = region->u.r_baseport; >> + uint32_t old_num = region->e_size; >> + int first_map = (old_num == 0); >> + struct ioperm_data data; >> + int i; >> + >> + region->e_physbase = addr; >> + region->e_size = size; >> + >> + DEBUG("e_phys=0x%x r_baseport=%x type=0x%x len=%d region_num=%d \n", >> + addr, region->u.r_baseport, type, size, region_num); >> + >> + memset(&data, 0, sizeof(data)); >> + >> + if (!first_map) { >> + data.start_port = old_port; >> + data.num = old_num; + data.turn_on = 0; >> + >> + for (i = 0; i < smp_cpus; ++i) >> + kvm_ioperm(qemu_kvm_cpu_env(i), &data); >> > > How does this interact with VCPU hot-plug? I have no idea. Weidong? >> +#ifdef KVM_CAP_IOMMU >> + /* We always enable the IOMMU if present >> + * (or when not disabled on the command line) >> + */ >> + r = kvm_check_extension(kvm_context, KVM_CAP_IOMMU); >> + if (r && !disable_iommu) >> + assigned_dev_data.flags |= KVM_DEV_ASSIGN_ENABLE_IOMMU; >> +#endif >> + r = kvm_assign_pci_device(kvm_context, &assigned_dev_data); >> + if (r < 0) { >> + fprintf(stderr, "Could not notify kernel about " >> + "assigned device \"%s\"\n", e_dev_name); >> + perror("register_real_device"); >> + goto out; >> + } >> + } >> > > You still succeed if KVM_CAP_DEVICE_ASSIGNMENT isn't defined? That > means a newer userspace compiled on an older kernel will silently > fail if they try to do device assignment. There's probably no > reason to build this file if KVM_CAP_DEVICE_ASSIGNMENT isn't defined > (see how the in-kernel PIT gets conditionally build depending on > whether that cap is available). Ok, I'll take a look at this. >> +#endif >> + term_printf("Registered host PCI device %02x:%02x.%1x " >> + "(\"%s\") as guest device %02x:%02x.%1x\n", >> + r_bus, r_dev, r_func, e_dev_name, >> + pci_bus_num(e_bus), e_device, r_func); >> >> > > If I read the code correctly, this term_printf() happens regardless > of whether this is being done for PCI hotplug or for command-line > assignment? That's a problem as it'll print garbage on the monitor > when you start QEMU which could break management applications. Is there a more suitable alternative or shall I just nuke it? >> diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c >> index d559f0c..5fdb726 100644 >> --- a/qemu/hw/pc.c >> +++ b/qemu/hw/pc.c >> @@ -33,6 +33,7 @@ >> #include "boards.h" >> #include "console.h" >> #include "fw_cfg.h" >> +#include "device-assignment.h" >> #include "qemu-kvm.h" >> @@ -1157,6 +1158,21 @@ static void pc_init1(ram_addr_t ram_size, int >> vga_ram_size, >> if (pci_enabled) >> virtio_balloon_init(pci_bus); >> + >> + if (kvm_enabled() && device_assignment_enabled) { >> + int i; >> > > Stray tab. Grrr. Silly emacs. > >> + for (i = 0; i < assigned_devices_index; i++) { >> + if (add_assigned_device(assigned_devices[i]) < 0) { >> + fprintf(stderr, "Warning: could not add assigned device >> %s\n", >> + assigned_devices[i]); >> + } >> + } >> + >> + if (init_all_assigned_devices(pci_bus)) { >> + fprintf(stderr, "Failed to initialize assigned devices\n"); >> + exit (1); >> + } >> + } >> } >> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__) >> + case QEMU_OPTION_pcidevice: >> + device_assignment_enabled = 1; >> + if (assigned_devices_index >= MAX_DEV_ASSIGN_CMDLINE) { >> + fprintf(stderr, "Too many assigned devices\n"); >> + exit(1); >> + } >> + assigned_devices[assigned_devices_index] = optarg; >> + assigned_devices_index++; >> + break; >> > > Tab damage. Thanks, will fix in the next revision. Cheers, Muli -- The First Workshop on I/O Virtualization (WIOV '08) Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/ <-> SYSTOR 2009---The Israeli Experimental Systems Conference http://www.haifa.il.ibm.com/conferences/systor2009/