From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yongji Xie Subject: Re: [Qemu-devel] [PATCH v2] vfio: Add support for mmapping sub-page MMIO BARs Date: Mon, 5 Sep 2016 18:25:38 +0800 Message-ID: <0e1154c2-8f1d-db55-f3c6-9e2423e55048@linux.vnet.ibm.com> References: <1470913557-4355-1-git-send-email-xyjxie@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, aik@ozlabs.ru, zhong@linux.vnet.ibm.com, gwshan@linux.vnet.ibm.com To: alex.williamson@redhat.com Return-path: Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:56912 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753859AbcIEKZv (ORCPT ); Mon, 5 Sep 2016 06:25:51 -0400 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u85AOJPR119562 for ; Mon, 5 Sep 2016 06:25:50 -0400 Received: from e23smtp07.au.ibm.com (e23smtp07.au.ibm.com [202.81.31.140]) by mx0a-001b2d01.pphosted.com with ESMTP id 257tp18r37-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 05 Sep 2016 06:25:50 -0400 Received: from localhost by e23smtp07.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 5 Sep 2016 20:25:48 +1000 Received: from d23relay08.au.ibm.com (d23relay08.au.ibm.com [9.185.71.33]) by d23dlp01.au.ibm.com (Postfix) with ESMTP id 7198C2CE8056 for ; Mon, 5 Sep 2016 20:25:45 +1000 (EST) Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay08.au.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u85APjv66685164 for ; Mon, 5 Sep 2016 20:25:45 +1000 Received: from d23av01.au.ibm.com (localhost [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u85APiML005277 for ; Mon, 5 Sep 2016 20:25:45 +1000 In-Reply-To: <1470913557-4355-1-git-send-email-xyjxie@linux.vnet.ibm.com> Sender: kvm-owner@vger.kernel.org List-ID: Ping? On 2016/8/11 19:05, Yongji Xie wrote: > Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap > sub-page MMIO BARs if the mmio page is exclusive") allows VFIO > to mmap sub-page BARs. This is the corresponding QEMU patch. > With those patches applied, we could passthrough sub-page BARs > to guest, which can help to improve IO performance for some devices. > > In this patch, we expand MemoryRegions of these sub-page > MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that > the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION > with a valid size. The expanding size will be recovered when > the base address of sub-page BAR is changed and not page aligned > any more in guest. And we also set the priority of these BARs' > memory regions to zero in case of overlap with BARs which share > the same page with sub-page BARs in guest. > > Signed-off-by: Yongji Xie > --- > hw/vfio/common.c | 3 +-- > hw/vfio/pci.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 77 insertions(+), 2 deletions(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index b313e7c..1a70307 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -661,8 +661,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region, > region, name, region->size); > > if (!vbasedev->no_mmap && > - region->flags & VFIO_REGION_INFO_FLAG_MMAP && > - !(region->size & ~qemu_real_host_page_mask)) { > + region->flags & VFIO_REGION_INFO_FLAG_MMAP) { > > vfio_setup_region_sparse_mmaps(region, info); > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index 7bfa17c..7035617 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -1057,6 +1057,65 @@ static const MemoryRegionOps vfio_vga_ops = { > }; > > /* > + * Expand memory region of sub-page(size < PAGE_SIZE) MMIO BAR to page > + * size if the BAR is in an exclusive page in host so that we could map > + * this BAR to guest. But this sub-page BAR may not occupy an exclusive > + * page in guest. So we should set the priority of the expanded memory > + * region to zero in case of overlap with BARs which share the same page > + * with the sub-page BAR in guest. Besides, we should also recover the > + * size of this sub-page BAR when its base address is changed in guest > + * and not page aligned any more. > + */ > +static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar) > +{ > + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); > + VFIORegion *region = &vdev->bars[bar].region; > + MemoryRegion *mmap_mr, *mr; > + PCIIORegion *r; > + pcibus_t bar_addr; > + > + /* Make sure that the whole region is allowed to be mmapped */ > + if (!(region->nr_mmaps == 1 && > + region->mmaps[0].size == region->size)) { > + return; > + } > + > + r = &pdev->io_regions[bar]; > + bar_addr = r->addr; > + if (bar_addr == PCI_BAR_UNMAPPED) { > + return; > + } > + > + mr = region->mem; > + mmap_mr = ®ion->mmaps[0].mem; > + memory_region_transaction_begin(); > + if (memory_region_size(mr) < qemu_real_host_page_size) { > + if (!(bar_addr & ~qemu_real_host_page_mask) && > + memory_region_is_mapped(mr) && region->mmaps[0].mmap) { > + /* Expand memory region to page size and set priority */ > + memory_region_del_subregion(r->address_space, mr); > + memory_region_set_size(mr, qemu_real_host_page_size); > + memory_region_set_size(mmap_mr, qemu_real_host_page_size); > + memory_region_add_subregion_overlap(r->address_space, > + bar_addr, mr, 0); > + } > + } else { > + /* This case would happen when guest rescan one PCI device */ > + if (bar_addr & ~qemu_real_host_page_mask) { > + /* Recover the size of memory region */ > + memory_region_set_size(mr, r->size); > + memory_region_set_size(mmap_mr, r->size); > + } else if (memory_region_is_mapped(mr)) { > + /* Set the priority of memory region to zero */ > + memory_region_del_subregion(r->address_space, mr); > + memory_region_add_subregion_overlap(r->address_space, > + bar_addr, mr, 0); > + } > + } > + memory_region_transaction_commit(); > +} > + > +/* > * PCI config space > */ > uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len) > @@ -1139,6 +1198,23 @@ void vfio_pci_write_config(PCIDevice *pdev, > } else if (was_enabled && !is_enabled) { > vfio_msix_disable(vdev); > } > + } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) || > + range_covers_byte(addr, len, PCI_COMMAND)) { > + pcibus_t old_addr[PCI_NUM_REGIONS - 1]; > + int bar; > + > + for (bar = 0; bar < PCI_ROM_SLOT; bar++) { > + old_addr[bar] = pdev->io_regions[bar].addr; > + } > + > + pci_default_write_config(pdev, addr, val, len); > + > + for (bar = 0; bar < PCI_ROM_SLOT; bar++) { > + if (old_addr[bar] != pdev->io_regions[bar].addr && > + pdev->io_regions[bar].size > 0 && > + pdev->io_regions[bar].size < qemu_real_host_page_size) > + vfio_sub_page_bar_update_mapping(pdev, bar); > + } > } else { > /* Write everything to QEMU to keep emulated bits correct */ > pci_default_write_config(pdev, addr, val, len);